A digital representation of still or video images consists of spatial samples of image intensity and/or color quantized to some particular bit depth. This bit depth is typically dependent upon the devices used to capture and display the still or video images. The dominant bit depth for still and video images has been 8 bits. This provides reasonable image quality and each sample fits perfectly into a single byte of digital memory.
Consequently, almost all image and video compression systems have been limited to 8-bit samples. For example, JPEG is specified only for 8-bit samples of R/G/B and MPEG-2 is specified only for 8-bit samples of Y/U/V. However, 8 bits is certainly not the limit imposed by human vision, and many applications require more fidelity than 8-bit samples can provide. For the case of images captured on film, professional scanners use 10-12 bits in approximately logarithmic units or roughly 14-16 bits linear. Professional video systems routinely require 10-bit data formats. Furthermore, an evolution to bit depths greater than 8 bits is coming to consumers in general. The next version of Microsoft's operating system, code-named Longhorn, is expected to have a new 10-bit per component display interface. In addition, modern compression techniques, such as JPEG2000 and H.264 are more efficient and have fewer artifacts than their predecessors. This makes them capable of compressing higher quality images without artifacts that would negate the benefits of greater bit depths. Also, the ever-increasing bandwidth of wireless and wired networks allows transporting video of larger format and higher quality. Taken together, this means that compression at higher quality levels is efficient enough to be practical. Thus, there is an emerging need for compression systems that operate with samples whose bit depth is greater than 8 bits.
Such greater bit depths allow higher fidelity in the overall compression. The fidelity of a compressed image is measured by the distortion, which is the mean-squared error (MSE) between the original image or frame and the reconstructed (compressed) image or frame normalized to the maximum possible (peak) amplitude and measured in logarithmic units. In short, the distortion PSNR (Peak Signal-to-Noise Ratio) in dB isPSNR=10 log(peak2/MSE)  (1)Greater bit depths permit higher values for PSNR. For example, the quantization error for N-bit sampling is commonly modeled as independent, uniformly distributed random noise over the interval [−½, ½] so that the MSE is 1/12 with respect to the least significant bit. Since the input samples are integers in the range [0, 2N−1], the peak value is 2N−1. The PSNR corresponding to this MSE isPSNR=10 log((2N−1)2/( 1/12))  (2)Since this represents the error between the original, unquantized image and its quantized representation, it represents an upper bound for the fidelity of the compressed result compared to the original image. Table 1 shows this upper bound for some representative bit depths:
TABLE 1Maximum PSNR as a function of bit depthbitPSNR limit depth(dB) (due (bits)to round-off)858.921070.991283.041495.0816107.12
All lossy compression systems, such as the example schematically shown in FIG. 1, incur some form of a trade-off between the degree of compression (the number of compressed bits in the case of a still image and the bit rate in the case of moving images) and the fidelity. This performance is formally characterized by a “rate-distortion” (R-D) curve. This curve is a graph of the distortion (in PSNR) as a function of the bits or bit rate required for the compressed representation (typically in Kbytes for images and Mbits/sec for moving images or video). FIG. 5 shows an example of a typical R-D curve. Rate-distortion curves show how well a particular compression-decompression system, or “codec,” performs over a range of compression ratios or bit rates for a particular input image or video sequence.
FIG. 1 shows schematically a generic prior art image compression/decompression system in which an original image is applied to an Encoder 2. The encoder's compressed bits output are applied to a Decoder 4 that produces a decompressed version of the image. The original image is compared to the decompressed image in a PSNR calculation 6 to provide the PSNR.
The method used to control where along the rate-distortion curve a compression system operates is through the use of a quantization parameter, or QP, to control quantization as indicated in FIGS. 4 and 5, which figures are described further below. The parameter QP determines the quantization step-size, QS, which is then directly used in quantization and dequantization functions or devices. The most general interpretation is that an integer QP is used to index a table of values for QS. Such a table contains a mapping from QP to QS. Thus, in FIG. 4, which shows schematically a generic prior art quantization and dequantization system, the quantization parameter QP is applied to a first mapping function 10 that generates a corresponding quantization step-size QS in accordance with predetermined mapping relationships. The same QP value is also applied to a second mapping function 12 that generates the same corresponding quantization step-size QS in accordance with the same predetermined mapping relationships. The quantization step-size QS produced by mapping function 10 controls the step size of quantizer 14 that receives an N-bit data word X. Quantizer 14 produces a quantized data word Q having a bit length that is a function of N, the quantization parameter QP, and the quantization step-size QS. Dequantizer 16 receives the quantized data word Q along with QS and produces a dequantized N-bit data word X′ that approximates the input N-bit data word X.
FIG. 5, shows a rate-distortion curve (distortion PSNR versus bit rate as QP is varied) for a hypothetical codec that employs both an identity mapping (QP=QS), such as that employed in prior art MPEG-1, MPEG-2 and MPEG-4 systems, and an exponential mapping, such as that employed in the H.264 system (QS=2QP/6−L). The distribution of quantization parameters QP is shown along the curve. The QP values above the curve are those for the identity mapping and the QP values below the curve are those for the exponential mapping. For identity mapping, low values of QP (indicating higher quality coding) are relatively sparse, becoming denser for high values of QP (lower quality coding). For exponential mapping, more values of QP are available for low values of QP and the distribution of QP values is more uniform than for the identity mapping.
FIG. 2 and FIG. 3 show block diagrams for an H.264 encoder and decoder, respectively. H.264, also known as MPEG-4/AVC, is considered the state-of-the-art in modern video coding. Although H.264 possesses many of the features common to previous MPEG (ISO) and ITU video codecs, it has many innovations. Although aspects of the present invention are usable in MPEG-1, MPEG-2 and MPEG-4 coding environments, aspects of the present invention may be used with particular advantage in H.264 coding environments. Details of H.264 coding are set forth in “Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec. H.264|ISO/IEC 14496-10 AVC),” Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6), 8th Meeting: Geneva, Switzerland, 23-27 May, 2003. Details of the “Fidelity Range Extensions” to the basic H.264 specifications are set forth in “Draft Text of H.264/AVC Fidelity Range Extensions Amendment,” Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6), 11th Meeting: Munich, DE, 15-19 March, 2004. Both of the just-identified documents are hereby incorporated by reference in their entireties. The “Fidelity Range Extensions” will support higher-fidelity video coding by supporting increased sample accuracy, including 10-bit and 12-bit coding. Aspects of the present invention are particularly useful in connection with the implementation of such increased sample accuracy. Further details regarding the H.264 standard and its implementation may be found in various published literature, including, for example, “The emerging H.264/AVC standard,” by Ralf Schafer et al, EBU Technical Review, January 2003 (12 pages) and “H.264/MPEG-4 Part 10 White Paper: Overview of H.264,” by Iain E G Richardson, Jul. 10, 2002, published at www.vcodex.com. Said Schafer et al and Richardson publications are also incorporated by reference herein in their entirety.
The H.264 encoder shown in FIG. 2 has elements now common in video coders: transform and quantization methods, entropy (lossless) coding, motion estimation (ME) and motion compensation (MC), and a buffer to store reconstructed frames. H.264 differs from previous codecs in a number of ways: an in-loop deblocking filter, many modes for intra-prediction, a new integer transform, two modes of entropy coding (variable length codes, and arithmetic coding), motion block sizes down to 4×4 pels, and so on. Of particular importance here is that H.264 has a different distribution of quantization step-sizes that makes its extension to higher bit depths more efficient than MPEG-2, for example. The outlined portion of FIG. 2 relates to the description of FIG. 7a, below.
The H.264 decoder shown in FIG. 3 can be readily seen as a subset of the encoder. The new quantization methods forming aspects of the present invention apply to both the decoder and the encoder. The outlined portion of FIG. 3 relates to the description of FIG. 7b, below.
All lossy image and video compression systems, including H.264 and all the other JPEG/MPEG/ITU standards, use quantization as the primary means to control the degree of compression, and hence the fidelity of the result. In other words, the degree of quantization used determines the operating point along the rate-distortion curve. This may be seen, for example, in FIG. 5.
The most common form of quantization is uniform (linear) quantization. MPEG-2 employs uniform quantization. In uniform quantization the quantized value is the original value scaled by a quantization step size (whose inverse is called the quantization resolution), QS, and converted to an integerQ=int[X/QS+r]  (3)where X is the continuous variable to be quantized, Q is the quantized value, and r is an optional rounding parameter in the interval [0,1). If r is 0, the quotient is truncated. If r is ½, the result corresponds to simple rounding. Other values of r are possible and useful. The corresponding dequantized value isX′=Q×QS+s  (4)where s is another rounding parameter, so that X′ is the quantized approximation to X. As described above, FIG. 4 shows this prior art in quantization and dequantization. Note that the number of bits used for the input, X, and the number of bits for the output, X′, are the same and there is a single quantization step-size, QS.
As discussed above, the method used to control where along the rate-distortion curve a compression system operates is through the use of a quantization parameter, or QP, to control quantization as indicated in FIGS. 4 and 5. The parameter QP determines the quantization step-size, QS, which is then directly used in the quantization and dequantization equations 3 and 4 (above). The most general interpretation is that an integer QP is used to index a table of values for QS. This table contains the mapping from QP to QS. There are two common mappings from QP to QS: an identity mapping (used in MPEG-2 and other standards)QS=QP  (5)and an exponential mappingQS=2QP/6−L  (6)which is used in H.264 (the value of L differs for quantizing luma versus chroma in this standard). Note that the quantization step-size is an integer for the identity mapping, while for the exponential mapping it is a floating-point number approximated by an integer. More precisely, in H.264, QS is represented by one of six integers, {2M, 2M+1/6, . . . , 2M+5/6}, for some value of M plus a number of shifts necessary to account for the difference between M and the integer portion of (QP/6) and L.
The identity and exponential mappings distribute quantization step-sizes very differently. The identity mapping is sparse for low QP values, but dense for high QP values, as indicated in FIG. 5. In contrast, the density of QP values for H.264 is more uniform. Table 2 compares these two mappings for each factor of two (octave) in quantization step-size. “QS#” is the number of quantization step sizes in the octave. This information may also be seen in FIG. 5. As shown in the table and in the figure, QP values of 1, 2, 4, 8, 16 and 32 for identity mapping correspond, respectively, to QP values of 0, 6, 12, 18, 24 and 30 for exponential mapping.
TABLE 2Distribution of quantization step-sizesIdentity ExponentialMappingMappingQS# {QPQS# {QP Octavevalues}values}11 {1}6 {0-5}22 {2-3}6 {6-11}34 {4-7}6 {12-17}48 {8-15}6 {18-23}516 {16-31}6 (24-29}61 {32}6 {30-35}7—6 {36-41}8—6 {42-47}9—5 {48-52}
The exponential mapping has the same density of quantization step-sizes for each octave. FIG. 5 shows how these two compare for a hypothetical rate-distortion plot (“hypothetical” in the sense that no existing codec is known to use both mappings). As mentioned above, the identity mapping is relatively sparse for low QPs, and very dense for high QPs, while the exponential mapping is relatively uniform for all QPs. As discussed further below, this makes the extension of quantization to higher bit depth much more efficient for H.264 with its exponential mapping than with the identity mapping of MPEG-2.
The prior art does nothing to normalize the effects of varying bit depth when performing quantization and dequantization operations. That is, the prior art simply uses equation (3) with equations (5) or (6) for quantization, and equation (4) for dequantization, without any modification for bit depth. This was the approach taken in the MPEG-4 N-Bit and Studio video compression profiles, which were designed to encode bit depths of up to 12 bits. However, because no changes were made to the quantization and dequantization methods when bit depth changes, the same value for QP produces different values for PSNR at different bit depths. What causes this is discussed below in connection with prior art quantization methods (and Table 3). At this point, the effects are set forth.
Suppose that for the MPEG-2 N-Bit profile a particular value of QP results in a PSNR of 40 dB at an 8-bit encoding depth; at a 10-bit encoding depth the same QP will result in a PSNR of roughly 52 dB. This change in PSNR reflects underlying differences in the coded bitstream—the number of bits in each quantized word in the bitstream is greater in the case of the 10-bit encoding depth. In order to have the same PSNR and the same quantized word lengths in the bitstream, the 10-bit QP would have to be four times as large. These differences make it more difficult to design encoders and decoders that can handle different bit depths, even though the 8-bit compression at QP and the 10-bit compression at 4 times that QP produce nearly identical compressed data—the quantized word lengths are the same but the underlying data represented by them may differ by a rounding difference. Thus, for a given QP value, the syntax and semantics of the bitstream produced by current encoders is not compatible for different bit depths. It would be advantageous to standardize QP parameters and quantized values among different bit depths. For the prior art, a compressed bitstream generated from 10-bit data using a 10-bit encoder will not play on current 8-bit decoders because QP and all the quantized values mean different things at different bit depths.