Modern media content distribution systems such as mobile video transmission systems are becoming increasingly popular. Bitstream scalability is a desirable feature in such media content distribution systems. An encoded media bitstream is generally called scalable when parts of the bitstream can be removed so that the resulting sub-bitstream can still be decoded by a target decoder. The media content of the sub-bitstream can be reconstructed at a quality that is less than that of the original bitstream, but still high when considering the resulting reduction of transmission and storage resources. Bitstreams that do not have these properties are also referred to as single-layer bitstreams.
Scalable Video Coding (SVC) is one solution to the scalability needs posed by the characteristics of video transmission systems. The SVC standard as specified in Annex G of the H.264/Advanced Video Coding (AVC) specification allows the construction of bitstreams that contain scaling sub-bitstreams conforming to H.264/AVC. H.264/AVC is a video compression standard equivalent to the Moving Pictures Expert Group (MPEG)-4 AVC (MPEG-4 AVC) standard.
The SVC standard encompasses different scalability concepts as described, for example, in Schwarz et al., “Overview of the Scalable Video Coding Extension of the H.264/AVC standard”, IEEE Transactions on Circuits and Systems for Video Technology”, Vol. 17, No. 9, September 2007. For spatial and quality bitstream scalability, i.e. the generation of a sub-bitstream with lower spatial resolution or quality than the original bitstream, Network Abstraction Layer (NAL) units are removed from the bitstream when deriving the sub-bitstream. In this case, inter-layer prediction, i.e., the prediction of the higher spatial resolution or quality bitstream based on information contained in the lower spatial resolution or quality bitstream, is used for efficient encoding. For temporal bitstream scalability, i.e., the generation of a sub-bitstream with a lower temporal sampling rate than the original bitstream, complete access units are removed from the bitstream when deriving the sub-bitstream. An access unit is defined as a set of consecutive NAL units with specific properties. In the case of temporal bitstream scalability, high-level syntax and inter prediction reference pictures in the bitstream are constructed accordingly.
In the SVC standard, the sub-bitstream having a lower temporal sampling rate, lower spatial resolution or lower quality is referred to as Base Layer (BL) sub-bitstream, while the higher temporal sampling rate, higher spatial resolution or higher quality sub-bitstream is referred to as Enhancement Layer (EL) sub-bitstream. It should be noted that in scenarios with multiple sub-bitstreams of, for example, different higher spatial resolution, two or more EL sub-bitstreams may be provided in total.
Each image of an SVC video image sequence is represented as so-called “frame” (i.e., as an encoded representation of this image). Each SVC sub-bitstream comprises a sequence of so called SVC “sub-frames”. Each SVC sub-frame constitutes either a full SVC frame or a fraction of a SVC frame. In other words, each SVC frame is either represented as a single data item (i.e., one BL “sub-frame” or one EL “sub-frame”) or is sub-divided in at least two separate data items, i.e., in one BL “sub-frame” containing only the BL information associated with the respective frame and (at least) one EL “sub-frame” containing the EL information associated with the respective frame. In the SVC bitstream an EL sub-frame may temporally correspond to a certain BL sub-frame.
The scalability feature introduced by the SVC standard allows for a bitstream adaptation dependent on, for example, decoder capabilities, display resolutions and available transmission bit rates. If only the BL sub-frames are decoded, the video content can be rendered for example at a basis resolution or quality (e.g., at Quarter Video Graphics Array, or QVGA, resolution). If, on the other hand, both the BL and the EL sub-frames are decoded, then the video content can be rendered at a higher resolution or quality (e.g., at VGA resolution).
The AVC specification as well as its SVC extension define so-called profiles. Each profile defines a set of coding tools (e.g., specific algorithms such as arithmetic or run length entropy coding) that are to be used for encoding and decoding the video content. As a result, the profiles implicitly define the complexity that is required to decode a (sub-)bitstream. The SVC extension of the AVC specification defines SVC specific profiles in addition to the conventional AVC profiles. One example of an SVC specific profile is the so-called Scalable Baseline Profile, which is targeted at mobile TV applications.
According to the SVC standard, an SVC BL sub-bitstream must be AVC compliant (i.e., must be decodable by an AVC compliant decoder). It should be noted that SVC EL sub-bitstreams are not required to be AVC compliant. As a result of the AVC compliance of an SVC BL sub-bitstream, an AVC Baseline Profile decoder will be able to decode the BL of a Scalable Baseline Profile bitstream.
It is likely to happen that future devices with media rendering capabilities will support SVC specific profiles (such as the Scalable Baseline Profile), but will not provide explicit support for any AVC profile. Such devices will be able to decode AVC bitstreams that comply with the SVC BL definition (such as bitstreams in accordance with the so-called Constrained Baseline Profile, which is a restricted version of the AVC Baseline Profile). However, the devices will not be able to decode AVC bitstreams that have been encoded according to more sophisticated AVC specific profiles, such as the AVC High Profile. The AVC High Profile is used today in Internet Protocol (IP) TV-like applications and may soon be used in high quality mobile TV applications.
In future there may exist a large amount of pre-encoded media content complying with the AVC High Profile. This pre-encoded video content will thus have to be transcoded before being consumable by devices only supporting SVC specific profiles such as the Scalable Baseline Profile. In other words, the pre-encoded AVC High Profile compliant video content will first have to be decoded and than re-encoded in accordance with either the AVC Constrained Baseline Profile or an SVC specific profile such as the Scalable Baseline Profile. Obviously, this transcoding operation consumes considerable computational resources and may additionally lead to a quality degradation.