Multimedia applications include local playback, streaming or on-demand, conversational and broadcast/multicast services. Interoperability is important for fast deployment and large-scale market formation of each multimedia application. To achieve high interoperability, different standards are specified.
Technologies involved in multimedia applications include, among others, media coding, storage and transmission. Media types include speech, audio, image, video, graphics and time text. Different standards have been specified for different technologies. Video coding standards include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ISO/IEC MPEG-4 Visual, ITU-T H.264 or ISO/IEC MPEG-4 AVC (abbreviated as AVC, AVC/H.264 or H.264/AVC in this document), and the possible future ones such as ISO/IEC MPEG-21 SVC, China AVS, ITU-T H.265, and ISO/IEC MPEG 3DAV.
Available media file format standards include ISO file format (ISO/IEC 14496-12), MPEG-4 file format (ISO/IEC 14496-14), AVC file format (ISO/IEC 14496-15) and 3GPP file format (3GPP TS 26.244).
3GPP TS 26.140 specifies the media types, formats and codecs for the multimedia messaging services (MMS) within the 3GPP system. 3GPP TS 26.234 specifies the protocols and codecs for the packet-switched streaming services (PSS) within the 3GPP system. The ongoing 3GPP TS 26.346 specifies the protocols and codecs for multimedia broadcast/multicast services (MBMS) within the 3GPP system.
Typical audio and video coding standards specify “profiles” and “levels.” A “profile” is a subset of algorithmic features of the standard and a “level” is a set of limits to the coding parameters that impose a set of constraints in decoder resource consumption. Indicated profile and level can be used to signal properties of a media stream and to signal the capability of a media decoder.
Through the combination of profile and level, a decoder can declare whether it can decode a stream without trying decoding, which may cause the decoder to crash, to operate slower than real-time, and/or to discard data due to buffer overflows, if the decoder is not capable of decoding the stream. Each pair of profile and level forms an “interoperability point.”
Some coding standards allow creation of scalable bit streams. A meaningful decoded representation can be produced by decoding only certain parts of a scalable bit stream. Scalable bit streams can be used for rate adaptation of pre-encoded unicast streams in a streaming server and for transmission of a single bit stream to terminals having different capabilities and/or with different network conditions. A list of other use cases for scalable video coding can be found in the ISO/IEC JTC1 SC29 WG11 (MPEG) output document N6880, “Applications and Requirements for Scalable Video Coding”, the 71th MPEG meeting, January 2005, Hong Kong, China.
Scalable coding technologies include conventional layered scalable coding techniques and fine granularity scalable coding. A review of these techniques can be found in an article by Weiping Li entitled “Overview of fine granularity scalability in MPEG-4 video standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, no. 3, pp. 301-317, March 2001.
Scalable video coding is a desirable feature for many multimedia applications and services used in systems employing decoders with a wide range of processing power. Several types of video scalability schemes have been proposed, such as temporal, spatial and quality scalability. These proposed types consist of a base layer and an enhancement layer. The base layer is the minimum amount of data required to decode the video stream, while the enhancement layer is the additional data required to provide an enhanced video signal.
The working draft of the scalable extension to H.264/AVC currently enables coding of multiple scalable layers. The working draft is described in JVT-N020, “Scalable video coding—working draft 1,” 14th meeting, Hong Kong, January 2005, and is also known as MPEG document w6901, “Working Draft 1.0 of 14496-10:200×/AMD1 Scalable Video Coding,” Hong Kong meeting, January 2005. In this coding of multiple scalable layers, the variable DependencyID signaled in the bitstream is used to indicate the coding dependencies of different scalable layers.
A scalable bit stream contains at least two scalability layers, the base layer and one or more enhancement layers. If one scalable bit stream contains more than one scalability layer, it then has the same number of alternatives for decoding and playback. Each layer is a decoding alternative. Layer 0, the base layer, is the first decoding alternative. Layer 1, the first enhancement layer, is the second decoding alternative. This pattern continues with subsequent layers. Typically, a lower layer is contained in the higher layers. For example, layer 0 is contained in layer 1, and layer 1 is contained in layer 2.
Each layer is characterized by a set of at least one property, such as Fine granularity scalability (FGS) information, Region-of-interest (ROI) scalability information, sub-sample scalable layer information, decoding dependency information, and initial parameter sets, that may be different from that of the entire stream.
In previous systems, it has not been possible to signal the following scalability information for a particular layer of a scalable bit stream in the bit stream itself, in the file format or through a transmission protocol: Fine granularity scalability (FGS) information; Region-of-interest (ROI) scalability information; Sub-sample or sub-picture scalable layer information; Decoding dependency information; and Initial parameter sets.