A file format is a particular way to encode information for storage in a computer file. The conventional manner of storing the format of a file is to explicitly store information about the format in the file system. This approach keeps the metadata separate from both the main data and the file name.
The ISO Base Media File Format is designed to contain timed media information or media data streams, such as a movie. The stored media information can be transmitted locally or via a network or other stream delivery mechanism. The files have a logical structure, a time structure, and a physical structure. The logical structure of the file includes a set of time-parallel tracks. The time structure of the file provides the tracks with sequences of data samples in time, and those sequences are mapped into a timeline of the overall media data stream by optional edit lists. The physical structure of the file separates the data needed for logic, time, and structural de-composition, from the media data samples themselves. This structural information is concentrated in a metadata box, possibly extended in time by metadata fragment boxes. The metadata box documents the logical and timing relationships of the data samples, and also includes pointers to where the data samples are stored.
Each media data stream is included in a track specialized for that media type (audio, video etc.), and is further parameterized by a sample entry. The sample entry includes the ‘name’ of the exact media type, for example the type of the decoder needed to decode the media data stream, and other parameters needed for decoding. There are defined sample entry formats for a variety media types.
Support for metadata takes two forms. First, timed metadata is stored in an appropriate track and synchronized with the media data it is describing. Second, there is general support for non-timed metadata attached to the media data stream or to an individual track. These generalized metadata structures are also be used at the file level in the form of a metadata box. In this case, the metadata box is the primary access means to the stored media data streams.
In some cases, the data samples within a track have different characteristics or need to be specially identified. One such characteristic is the synchronization point, often a video I-frame. These points are identified by a special table in each track. More generally, the nature of dependencies between track samples is documented in this manner. There is also the concept of sample groups. Sample groups permit the documentation of arbitrary characteristics that are shared by some of the data samples in a track. In the Advanced Video Coding (AVC) file format, sample groups are used to support the concept of layering and sub-sequences.
The AVC file format defines a storage format for video streams encoded according to the AVC standard. The AVC file format extends the ISO Base Media File Format. The AVC file format enables AVC video streams to be used in conjunction with other media streams, such as audio, to be formatted for delivery by a streaming server, using hint tracks, and to inherit all the use cases and features of the ISO Base Media File Format.
FIG. 1 illustrates an exemplary configuration of an AVC file format 10 including a media data section 20 and a metadata section 30. Each data stream is stored in the media data section 20. Multiple data streams can be stored in one file format. As shown in FIG. 1, four data streams 22, 24, 26, and 28 are stored in the media data section 20. For each data stream stored in the media data section of the AVC file format there is a corresponding track stored in the metadata section. In FIG. 1, a track 32 corresponds to the data stream 22, a track 33 corresponds to the data stream 24, a track 36 corresponds to the data stream 26, and a track 38 corresponds to the data stream 28. In general, there are N tracks stored in the metadata section for N data streams stored in the data section.
The H.264, or MPEG-4 Part 10, specification is a high compression digital video codec standard written by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG) in a collective effort partnership often known as the Joint Video Team (JVT). The ITU-T H.264 standard and the ISO/IEC MPEG-4 Part 10 standard (formally, ISO/IEC 14496-10) are technically identical, and the technology is also known as AVC, for Advanced Video Coding. It should be noted that H.264 is a name related to the ITU-T line of H.26x video standards, while AVC relates to the ISO/IEC MPEG side of the partnership project that completed the work on the standard, after earlier development done in the ITU-T as a project called H.26L. It is usual to call the standard as H.264/AVC (or AVC/H.264 or H.264/MPEG-4 AVC or MPEG-4/H.264 AVC) to emphasize the common heritage. Occasionally, it has also been referred to as “the JVT codec”, in reference to the JVT organization that developed it.
Currently JVT is working on a new codec known as the Scalable Video Codec (SVC), which would be an extension to the existing AVC codec. Work on the SVC started independently in the MPEG domain initially as a part of the MPEG-21 standard in 2003. But during its development in 2004, it was merged with the activities of the JVT group with a focus towards developing coding technology that would be backwards compatible with the existing AVC codec. As such it currently is jointly developed by the JVT group in MPEG and ITU-T. The goal of the Scalable Video Codec (SVC) activity is to address the need and provide for scalability in the Spatial, Temporal and Quality (SNR) levels.
The existing file formats (ISO/MP4 and AVC) do not provide an easy and clear mechanism to extract the different variations of the spatial, temporal and SNR (quality) layers from the stored media data in the file format. Therefore, this information must be extracted by parsing the coded media stream, which is very inefficient and slow. Thus, there is a need to enhance and define new extensions to support the storage of emerging video coding standards such as SVC and to address the existing limitations of current file format storage methods. These new extensions define a structuring and grouping mechanism for the dependencies that exist in a group of pictures and within each sample to obtain a flexible stream structure that provides for spatial, temporal, and quality flexibility. The SVC standard proposes to encode the entire scalable media data as one single scalable bitstream, from which variants of temporal, spatial and quality layers can be extracted.
In the AVC standard, each video stream is encoded, and subsequently decoded, as an independent stream according to a particular frame rate, resolution, and quality. According to the SVC standard, from the single encoded video stream, referred to as a SVC elementary stream, multiple different types of video can be extracted, for example a low resolution video stream, a standard resolution video stream, or a high resolution video stream. To support the storage and extraction of such scalable video streams in the file format, the file formats need to be modified.
The SVC standard is currently under development and as the SVC standard defines a new design for a video codec, an appropriately defined new file format standard is also required to enable the storage and extraction of the new SVC video streams. To support the new SVC video streams, a new SVC file format is under development which extends the AVC file format to support the storage of the SVC video streams. However, specific extensions that define access to the stored scalable video have yet to be developed.