MP4 (known as “MPEG-4 Part 14”, or “ISO/IEC 14496-14:2003”) is a multimedia container file format standard specified as a part of MPEG-4. It is used to store digital audio and video streams and other data such as subtitles and still images. Mainly designed for video storage, MP4 is also used by Internet Video websites to transfer video content in a pseudo-streaming fashion. This is, a video player downloads the clip and plays the video content as it becomes available.
For example, an MP4 file 100 in FIG. 1A is made up of a hierarchy of objects, referred to as boxes including, but not limited to, boxes 110 and 120. Each box is a contiguous range of bytes within the file. Each box may be identified by a four character box type within the file. Two boxes at the top of the hierarchy are most relevant here, i.e., a movie box (type moov 110) and a media data box (type mdat 120). Moov box 110 includes all file information 112 describing MP4 file 100. Mdat box 120 includes all encoded audio and video frames, for example, frames 122, 124, and 126. Moov section 110 is a table of contents for the file and includes a media frame index that references each frame in MP4 file 100 and specifies a frame size and a byte offset for each frame. For example, moov section 110 may include entries 114, 116, and 118, having frame sizes and byte offsets for each encoded frame within MP4 file 100. Moov section 110 is shown as a single table in FIG. 1A, but may be distributed across several structures when encoded in an MP4 file format. Data within mdat box 120 may be unframed. Within mdat box 120, there is no indication where one frame ends and the next begins. The only way to distinguish samples is to use the file information 112 in moov box 110.
The order of the moov and mdat boxes is not defined in general. For different MP4 use cases, the boxes must appear in a specific order. An MP4 player must read the entire moov section 110 before playback can begin. When streaming over HTTP, it is desirable for the player to begin playing before the video has downloaded completely. To support this case, moov box 110 should appear before mdat box 120.
Generating an MP4 file that can be streamed to an MP4 player is usually a two-step process. In the first step, an encoder may generate frames and record their sizes in a separate table. The generated frames may be written to an mdat box in a temporary file or buffer. After all frames have been encoded, the encoder may then write the moov box. In the second step, the encoder may arrange the moov and mdat boxes in a correct order for streaming. One of the problems with this traditional two step MP4 encoding process is that it cannot provide a real-time transcoding, compression, optimization, or any other real-time, on-the-fly modification process.