This post is the first post in a series I’m writing to help you discover the many different ways to handle tracks in MP4 files using MP4Box and other GPAC tools, with a particular focus on three types of tracks: subtitles, metadata and graphics tracks. Let me start in this post with subtitle tracks.
There are plenty of subtitle formats and plenty of types and categories of subtitles. GPAC’s support for subtitles is based on the support for the ISO Base Media File Format (ISOBMFF). The ISOBMFF considers that any data that produces human readable text to be used as subtitles, closed captions,is, well, … subtitles. It further considers that there are two major classes of subtitle formats: formats which require only text processing capabilities (text decoding, text layout) and formats which also require image processing capabilities. These classes are identified by the Track Handler Type.
The handler type is a code given with 4 ASCII characters. Formats which require only text processing are stored in tracks identified by the handler ‘text’. Formats which may require also image processing are identified by the handler ‘subt’. GPAC supports both classes of tracks. The choice of which track handler type to use is not left to the content creator or to the packager. It is decided by the specification defining the carriage of that subtitle format in ISO tracks. Since tracks of a given handler type may be used to store different possible formats, there is a need to identify that format when processing the file at a high level (ie. without decoding the subtitle frames or before the file is transmitted). This is done by the so-called Sample Entry Code.
This sample entry code is also a 4 ASCII character code. So identifying a track type requires at least the couple (‘handler type’, ‘sample entry code’). In this post and the followings, I’ll use the syntax <handler-type>:<sample-entry-code> to identify a track type. Some sample entry codes are very specific to a particular format. Some other are generic formats. In fact, any one can define and register its sample entry code for its specific format. A registry of those identifiers is maintained by the MPEG Registration Authority. Here is a list of subtitle formats and there associated identifiers from the MP4RA site:
|text:tx3g||Tracks containing samples whose payload is binary data according to the 3GPP Timed Text format defined by 3GPP/MPEG.|
|sbtl:tx3g||Apple specific identifiers for so-called “Subtitle media”. The payload is the same as text:tx3g.|
|text:text||Apple specific identifiers for so-called “Text media”. The payload is similar to text:tx3g and sbtl:tx3g with some differences, and this is not officially registered on MP4RA.|
|clcp:c608 and clcp:c708||Apple specific identifiers for so-called “Closed Captioning media”. Not supported by GPAC (import/export and playback, DASHing may work).This is not officially registered on MP4RA.|
|text:wvtt||Tracks containing samples whose payload is binary data defined by MPEG that encapsulates W3C WebVTT subtitles.|
|subt:stpp||Tracks containing samples whose payload are XML documents. This format is defined by MPEG. All samples carry one entire XML document and use the same XML language. Further information stored in the Sample Entry box (such as namespace) and possibly in the XML samples is required to precisely identify the XML languages of those subtitles. This is currently used to carry TTML, SMPTE-TT or EBU-TT (more on GPAC support for EBU-TTD) but may be used by any other XML format. A particular version of this format is adopted by DECE.|
|text:stxt||Tracks containing samples whose payload is raw text. This format is defined by MPEG. Additional sample entry information (namely mime type) is required to identify the type of text data. This is only used experimentally for the moment (in particular in GPAC).|
|subt:sbtt||Similar to text:stxt, but for “subtitles”. It is defined also by MPEG but not yet used.|
MP4Box supports all these types as described in the following figure:
The associated command lines using MP4Box are as follows:
- Importing GPAC Timed Text XML as a 3GPP Timed Text track (text:tx3g):
MP4Box -add file.ttxt output.mp4
- Exporting a 3GPP Timed Text track as GPAC Timed Text XML (assuming 1 is the trackId of the track):
MP4Box -ttxt 1 output.mp4
- Importing SRT subtitles as a 3GPP Timed Text track:
MP4Box -add file.srt output.mp4
- Exporting a 3GPP Timed Text track as an SRT file:
MP4Box -srt 1 output.mp4
- Converting GPAC Timed Text XML to SVG:
MP4Box -svg file.ttxt
- Converting SRT to SVG:
MP4Box -svg file.srt
- Exporting a 3GPP Timed Text Track as SVG (not yet possible).
- Importing WebVTT content as a WVTT track:
MP4Box -add file.vtt output.mp4
- Exporting a WebVTT file from a WVTT track:
MP4Box -raw 1 output.mp4
- Importing a TTML file as a STPP Track:
MP4Box -add file.ttml output.mp4
- Exporting an STPP Track as a TTML document.
/!\ Not available yet as a track reconstruction but you can extract the individual samples (will generate one TTML output per MP4 sample):
MP4Box -raws 1 output.mp4
- Importing an SRT file as WVTT track:
MP4Box -add file.srt:fmt=VTT output.mp4
Note that it is also possible using a combination of those steps to convert TTXT or SRT to WebVTT.
As a consequence of the packaging, it is possible to create DASH content using those formats.