The subject matter disclosed herein relates to MP4 container file formats and methods of processing MP4 container files.
ISO/IEC 14496-14:2003, otherwise known as MPEG-4 Part 14, and commonly known as MP4, defines a file format that allows storage of media content. The MP4 file format is a container format having the ability to hold a variety of media types and their respective data (such as video, audio, metadata, and user information) using a common format. In general, an MP4 file is logically divided into tracks. Each track represents a timed sequence of presentation units and within each track each timed presentation unit is called a sample. A sample may be a frame of video or audio or metadata information. In video, the frame represented may be an I (intra coded) frame or a P (predictive) frame or a B (Bi-directional) frame. An I frame is commonly referred to as a key frame whereas a P frame may be considered to be a dependent frame. The overall media presentation, whether audio, video or both audio and video, is referred to as a movie. All the data within a conforming file is encapsulated in boxes called atoms. No data needed to support presentation of the media samples, whether audio data, video data or metadata, is stored outside the atom structure. Neither the physical structure of the file nor the layout of the media is tied to the time ordering of the media. Generally, an MP4 file is composed of a hierarchy of atoms, in which there is a single top level atom and numerous lower level atoms. A lower level atom may itself contain lower level atoms. A lower level atom has an immediate parent atom, and the lower level atom is considered a sub-atom of its parent atom. The lowest level atom, which does not contain a lower level atom, is referred to as a leaf atom.
The MP4 standard prescribes names for various types of atoms. All the media data representing presentation units (compressed or otherwise) are defined under the atom of type “mdat.”
All configuration-related information, metadata describing the nature of the media, the properties of the tracks and their timing requirements, for example, are defined under the atom of type “moov.” The MP4 file contains only one moov atom, which generally comes at the beginning of the MP4 file. The moov atom may contain a user data atom “udta” declaring user information about the container atom and its data. FIG. 1 illustrates schematically the arrangement of metadata and media data in a simple MP4 container atom.
The information relating to a track of the movie is contained within the moov atom in an atom “trak.” The information also includes an offset value that specifies the location of the relevant media data in the mdat atom.
An MP4 file is read, and the media samples presented, by a media player application. In order to play a particular track, the media player application parses the moov atom to find the track atom, reads the offset value from the track atom and jumps to the offset location in order to read the media data.
In some cases, some of the information is not present in the moov atom but is contained in one or more “moof” atoms, where each moof atom contains trak atoms and has its own corresponding mdat atom (containing media samples) associated therewith. FIG. 2 illustrates schematically this fragmented arrangement of data in an MP4 file. As shown in FIG. 2, a fragmented MP4 file containing moof atoms and associated mdat atoms also contains an mfra (random access for moof) atom. The fragmented structure allows a track to be delivered in multiple segments or fragments, such as different scenes. Different fragments of a movie are presented sequentially. In order to play back a particular track, the media player application must find the trak atoms in the moov atom and each of the moof atoms, read the offset value from each trak atom, compute the offset from the beginning of the movie, and jump to the computed offset location in each mdat atom in order to read the media data.
Among the metadata stored in the moov atom (and in the moof atoms in the case of a fragmented movie) is a duration value. In the case of an MP4 file that is not fragmented, the duration value stored in the trak atom for each of the tracks under moov atom reflects the duration of the each of the samples in the track and the number of samples in the track. The duration value may then be used by the media player application to display a time bar representing the movie duration and a cursor indicating the current time within the movie duration. Assuming that the tracks are of equal duration, or the user has selected the longest track, the user is thereby provided with an indication of how much of the total movie duration has elapsed. If the tracks are not of equal duration and the user has selected a track other than the longest, the display of movie duration may mislead the user.
In the case of a file that is fragmented, the duration value that is read from the moov atom may be the duration of the longest track of the first fragment. The duration values that are stored in the moof atoms are the durations of the tracks of the respective fragments. Thus, the conventional media player application may not be able to display a reliable indication of the actual movie duration.
The conventional MP4 file format does not provide easily accessible information regarding the offset of a fragment from the start of the file. Fragment offset would provide a helpful tool for browsing the fragmented file. A user who wishes to display the Nth movie fragment of a fragmented MP4 file must parse the track of interest and accumulate the offsets under the moov atom and subsequent moof atoms (up to the N−1th moof) in order to obtain the offset of the Nth fragment relative to the beginning of the movie and then jump to the required offset.
During a live recording session it may be desired to build an incremental presentation by adding a track after media data for other tracks has already been acquired. Referring to FIG. 3, using the current conventional MP4 file format the trak atom is added to the moov atom and the media data may be added using an additional moof atom. The moov atom contains the properties of the track whereas the mdat atom for the track is associated with the added moof atom. This approach may involve buffering a huge amount of media data for existing tracks while waiting for additional tracks. Since the trak atom is added to the moov atom, the moov atom is enlarged, changing the locations of the metadata and media data for the existing tracks, and it is necessary to recalculate and update the offset values stored in the tracks that have previously been saved in the moov atom. Rearranging the media data and metadata may involve significant processing computation/time and memory operations. The above mentioned situations could become more challenging when pre-buffered data (that is, data of the new track that is already present) is to be recorded.
As shown in FIG. 4, the moov atom may contain a user data atom of type “udta” declaring user information about the container atom and its data relevant to the movie as a whole.