“Bit depth”, which is also interchangeably known as “color depth” and/or “pixel depth”, refers to the number of bits used to hold a pixel. The bit depth determines the maximum number of colors that can be displayed at one time. In recent years, digital images and/or digital videos with a bit depth greater than eight are more desirable in many application fields including, but not limited to, medical image processing, digital cinema workflows in production and postproduction, home theatre related applications, and so forth.
There are several ways to handle the coexistence of, for example, an 8-bit video and a 10-bit video. In a first prior art solution, only a 10-bit coded bit-stream is transmitted and the 8-bit representation for standard 8-bit display devices is obtained by applying tone mapping methods to the 10-bit presentation. Tone mapping is a well-known technique to convert a higher bit depth to a lower bit depth, often to approximate the appearance of high dynamic range images in media with a more limited dynamic range.
In a second prior art solution, a simulcast bit-stream that includes an 8-bit coded presentation and 10-bit coded presentation is transmitted. It is the preference of the decoder in choosing which bit-depth to decode. For example, a 10-bit capable decoder can decode and output a 10-bit video while a normal decoder supporting only 8-bit video can just output an 8-bit video.
The first solution is inherently non-compliant with 8-bit profiles of the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 recommendation (hereinafter the “MPEG-4 AVC standard”). The second solution is compliant to all the current standards but requires more overhead. However, a good tradeoff between bit reduction and backward standard compatibility can be a scalable solution. Scalable video coding (SVC), also known as a scalable extension of the MPEG-4 AVC Standard, considers the support of bit-depth scalability.
There are at least three advantages of bit-depth scalable coding over post-processing or simulcast. A first advantage is that bit-depth scalable coding enables 10-bit video in a backward-compatible manner with the High Profiles of the MEG-4 AVC Standard. A second advantage is that bit-depth scalable coding enables adaptation to different network bandwidths or device capabilities. A third advantage of the bit-depth scalable coding is that is provides low complexity, high efficiency and high flexibility.
In the current scalable video coding extension of the MPEG-4 AVC Standard, single-loop decoding is supported to reduce the decoding complexity. The complete decoding, including motion-compensated prediction and the deblocking, of the inter-coded macroblocks is only required for the current spatial or coarse grain scalable (CGS) layer. This is realized by constraining the inter-layer intra texture prediction to those parts of the lower layer picture that are coded with intra macroblocks. To extend inter-layer intra texture prediction for bit depth scalability, inverse tone mapping is used. Scalable video coding also supports inter-layer residue prediction. Since in general, tone mapping is used in pixel (spatial) domain, it is very difficult to find the corresponding inverse tone mapping in the residue domain. In third and fourth prior art approaches, bit shift is used for inter-layer residue prediction.
In a fifth prior art approach referred to as smooth reference prediction (SRP), which is a technique to increase interlayer coding efficiency for single loop decoding without bit depth scalability, a one-bit syntax element smoothed_reference_flag is sent when the syntax elements residual_prediction_flag and base_mode_flag are both set. When smoothed_reference_flag is equal to one, the following steps are taken at the decoder to obtain the reconstructed video block:                1. The prediction block P is obtained using the enhancement layer reference frames and the upsampled motion vectors from base layer;        2. The corresponding base layer residual block rb is upsampled and U(rb) is added to P to form P+U(rb);        3. A smoothing filter with tap [1,2,1] is applied, first in the horizontal direction and then in the vertical direction, to obtain S(P+U(rb)); and        4. The enhancement layer residual block is added to (3) to obtain the reconstruction block R=S(P+U(rb))+re.        
Turning to FIG. 1, a portion of a decoder using smooth reference prediction is indicated generally by the reference numeral 100.
The decoder portion 100 includes a motion compensator 112 having an output in signal communication with a first non-inverting input of a combiner 132. An output of the combiner 132 is connected in signal communication with an input of a switch 142. A first output of the switch 142 is connected in signal communication with a first non-inverting input of a combiner 162. A second output of the switch 142 is connected in signal communication with an input of a filter 152. An output of the filter 152 is connected in signal communication with the first non-inverting input of the combiner 162.
An output of a reference frame buffer 122 is connected in signal communication with a first input of the motion compensator 112.
A second input of the motion compensator 112 is available as an input to the decoder portion 100, for receiving enhancement layer motion vectors. A third input of the motion compensator 112 is available as an input to the decoder portion 100, for receiving upsampled base layer motion vectors. A second non-inverting input of the combiner 132 is available as an input of the decoder portion 100, for receiving an upsampled base layer residual. A control input of the switch 142 is available as an input of the decoder portion 100, for receiving a smoothed_reference_flag syntax element. A second non-inverting input of the combiner 162 is available as an input of the decoder portion 100, for receiving an enhancement layer residual. An output of the combiner 162 is available as an output of the decoder portion 100, for outputting a reconstructed block R.
However, the preceding prior art techniques disadvantageously cannot be directly used with bit depth scalability.