The present invention is concerned with scalable video coding and, in particular, with scalable video coding supporting pixel value refinement scalability.
A current project of the Joint Video Team (JVT) of the ISO/IEC Moving Pictures Experts Group (MPEG) and the ITU-T Video Coding Experts Group (VCEG) is the development of a scalable extension of the state-of-the-art video coding standard H.264/MPEG4-AVC defined in T. Wiegand, G. J. Sullivan, J. Reichel, H. Schwarz and M. Wien, eds., “Joint Draft 10 of SVC Amendment”, Joint Video Team, Doc. JVT-W201, San Jose, Calif., USA, April 2007 and J. Reichel, H. Schwarz, and M. Wien, eds., “Joint Scalable Video Model JSVM-10”, Joint Video Team, Doc. JVT-W202, San Jose, Calif., USA, April 2007, supports temporal, spatial and SNR scalable coding of video sequences or any combination thereof.
H.264/MPEG4-AVC as described in ITU-T Rec. & ISO/IEC 14496-10 AVC, “Advanced Video Coding for Generic Audiovisual Services,” version 3, 2005, specifies a hybrid video codec in which macroblock prediction signals are either generated in the temporal domain by motion-compensated prediction, or in the spatial domain by intra prediction, and both predictions are followed by residual coding. H.264/MPEG4-AVC coding without the scalability extension is referred to as single-layer H.264/MPEG4-AVC coding. Rate-distortion performance comparable to single-layer H.264/MPEG4-AVC means that the same visual reproduction quality is typically achieved at 10% bit-rate. Given the above, scalability is considered as a functionality for removal of parts of the bit-stream while achieving an R-D performance at any supported spatial, temporal or SNR resolution that is comparable to single-layer H.264/MPEG4-AVC coding at that particular resolution.
The basic design of the scalable video coding (SVC) can be classified as a layered video codec. In each layer, the basic concepts of motion-compensated prediction and intra prediction are employed as in H.264/MPEG4-AVC. However, additional inter-layer prediction mechanisms have been integrated in order to exploit the redundancy between several spatial or SNR layers. SNR scalability is basically achieved by residual quantization, while for spatial scalability, a combination of motion-compensated prediction and oversampled pyramid decomposition is employed. The temporal scalability approach of H.264/MPEG4-AVC is maintained.
In general, the coder structure depends on the scalability space that is necessitated by an application. For illustration, FIG. 7 shows a typical coder structure 900 with two spatial layers 902a, 902b. In each layer, an independent hierarchical motion-compensated prediction structure 904a,b with layer-specific motion parameters 906a, b is employed. The redundancy between consecutive layers 902a,b is exploited by inter-layer prediction concepts 908 that include prediction mechanisms for motion parameters 906a,b as well as texture data 910a,b. A base representation 912a,b of the input pictures 914a,b of each layer 902a,b is obtained by transform coding 916a,b similar to that of H.264/MPEG4-AVC, the corresponding NAL units (NAL—Network Abstraction Layer) contain motion information and texture data; the NAL units of the base representation of the lowest layer, i.e. 912a, are compatible with single-layer H.264/MPEG4-AVC. The reconstruction quality of the base representations can be improved by an additional coding 918a,b of so-called progressive refinement slices; the corresponding NAL units can be arbitrarily truncated in order to support fine granular quality scalability (FGS) or flexible bit-rate adaptation.
The resulting bit-streams output by the base layer coding 916a,b and the progressive SNR refinement texture coding 918a,b of the respective layers 902a,b, respectively, are multiplexed by a multiplexer 920 in order to result in the scalable bit-stream 922. This bit-stream 922 is scalable in time, space and SNR quality.
Summarizing, in accordance with the above scalable extension of the Video Coding Standard H.264/MPEG4-AVC, the temporal scalability is provided by using a hierarchical prediction structure. For this hierarchical prediction structure, the one of single-layer H.264/MPEG4-AVC standards may be used without any changes. For spatial and SNR scalability, additional tools have to be added to the single-layer H.264/MPEG4.AVC. All three scalability types can be combined in order to generate a bit-stream that supports a large degree on combined scalability.
Certain applications may benefit from enhancement layers, which allow extracting and displaying higher bit-depth and, possibly, higher spatial resolution content on top of a base layer with low bit-depth or, in more general terms, lower pixel value resolution, and, possibly, lower spatial resolution. In the above-mentioned version of the scalable extension, however, scalability tools are only specified for the case that both the base layer and the enhancement layer represent a given video source with the same bit-depth/pixel value resolution of the corresponding arrays of luma and chroma samples.
Thus, it would be advantageous to provide a scalable video scheme that supports scalability in terms of pixel value resolution. It would be further advantageous if this video coding scheme would support a broad spectrum of possible pixel value resolution mappings between different levels of pixel value resolutions. Moreover, it would be favorable if the video coding scheme would keep the computation overhead on the decoder side low. Using one of the above-mentioned video coding techniques, providing scalability in terms of pixel value resolution would necessitate the incorporation of two separate, totally self-contained video bit-streams coded based on different pixel value resolutions into one common scalable data-stream. However, this results in a bad compression ratio.