MP4 (known as “MPEG-4 Part 14”, or “ISO/IEC 14496-14:2003”) is a multimedia container file format standard specified as a part of MPEG-4, It is used to store digital audio and video streams and other data such as subtitles and still images. Mainly designed for video storage, MP4 is also used by Internet Video websites to transfer video content in a pseudo-streaming fashion. That is, a video player downloads the dip and plays the video content as it becomes available.
Generating an MP4 file that can be streamed to an MP4 player is traditionally a two-step process. In the first step, an encoder may generate frames and record their sizes in a separate table. The generated frames may be mitten to an ‘mdat’ box in a temporary file or buffer. After all frames have been encoded, the encoder may then write metadata information to a ‘moov’ box. In the second step, the encoder may arrange the ‘Moov’ and ‘mdat’ boxes in a correct order for streaming. One of the problems with this traditional two step MP4 encoding process is that it cannot overlap transcoding, compression, optimization, or any other on-the-fly modification process with streaming and playback of the final result. One solution to the traditional two step MP4 encoding process is to predict the size of each frame of the target video stream based on the frame size in the ‘moov’ box of the source video stream, and generate a ‘moov’ box with these predicted sizes for target video frames. During the transcoding process, each frame is coded to exactly match the size specified in the ‘moov’ box so that the indices to the target frames match the location of the video payload data. However, this solution has the shortcoming that frame order is not analyzed in the source ‘moov’ box nor is it specified in the target ‘moov’ box so bidirectionally coded (B) frames cannot be included in the stream, and thus information from the way the source media was encoded is not applied to better optimize the encoding of the transcoded media.