This invention relates to image and video data compression and more particularly to a deadzone quantizer.
Image data for digital video and still images are often compressed to represent images with less data, thus save storage costs and transmission time and cost. In general, the goal in image data compression is to decrease the data required to represent an image. However, the reduction in data representing an image must be accomplished without substantial penalty in picture quality.
The most effective compression is achieved by approximating the original image, rather than reproducing it exactly. In the motion picture arena, two standards (referred hereinbelow as xe2x80x9cMPEG-1xe2x80x9d and xe2x80x9cMPEG-2xe2x80x9d) have been developed by the Motion Picture Experts Group (MPEG) to specify both the coded digital representation of video signal for the storage media, and the method for decoding. In the still image arena, the Joint Photographic Experts Group (JPEG) has set the international standard (referred hereinbelow as xe2x80x9cJPEGxe2x80x9d and xe2x80x9cJPEG 2000xe2x80x9d) for color image compression. For MPEG-1, MPEG-2, as well as JPEG and forthcoming JPEG 2000, the greater the compression, the more approximate (xe2x80x9clossyxe2x80x9d) the rendition is likely to be.
The above-mentioned compression standards use the image transform. The most common transform is the discrete cosine transform (DCT) which is used in MPEG-1, MPEG-2 and JPEG. Another transform type is wavelet transform which will be adopted by the JPEG 2000 standard. The DCT has certain properties that simplify coding models and make the coding efficient in terms of perceptual quality measures. In general, the DCT is a method of decomposing a block of data into a weighted sum of spatial frequencies. Each of the spatial frequency patterns for a DCT, e.g., an 8xc3x978 DCT, has a corresponding coefficient, in which the amplitude needed to represent the contribution of that spatial frequency pattern in the block of data being analyzed. In other words, each spatial frequency pattern is multiplied by its coefficient and the resulting 64 8xc3x978 amplitude arrays are summed, each pel separately, to reconstruct the 8xc3x978 block. Note that the 8xc3x978 DCT consists an 8 by 8 array of pels. Pel is a contraction for picture or print element used in the displaying/printing industry.
At the heart of the compression is a quantizer. When the DCT is computed for a block of pels, it is desirable to represent the coefficients for high spatial frequencies with less precision. Quantization allows the reduction of accuracy with which the DCT coefficients are represented when converting the DCT coefficient to an integer representation. Quantization is very important in image compression, as it tends to make many coefficients zero, especially those for high spatial frequencies, and thus saves storage space and/or transmission bandwidth.
Conventionally, a DCT coefficient is quantized by dividing it by a nonzero positive number called a quantization step size and rounding the quotient to the nearest integer called a quantization index. By multiplying this integer quantization index with the quantization step size, an approximation of the true DCT coefficient is obtained. This approximation is called the quantized transform coefficient, quantized DCT coefficient, quantization value or reconstruction value. The bigger the quantization step size is, the lower the quantized DCT coefficient precision. The lower precision coefficients can be transmitted to a decoder with fewer bits than higher precision coefficients. The use of large quantization step sizes for high spatial frequencies allows the encoder to selectively discard high spatial frequency activity that the human eye cannot readily perceive.
FIGS. 1A-1D (prior art) illustrate quantization and dequantization. FIG. 1A shows unquantized DCT coefficients 100 at an interval of one. It is to be noted that the unquantized DCT coefficient is a real number and can be either an integer or a non-integer. FIG. 1B shows quantization intervals 102 at an interval of xcex94. For example, the interval xcex94 chosen is 4. Each interval 102 is called a xe2x80x9cbinxe2x80x9d. The center interval surrounding the value zero xe2x80x9c0xe2x80x9d, e.g., values xe2x88x922 and 2, is called the xe2x80x9czero binxe2x80x9d. FIG. 1C shows quantized DCT coefficient index 104 after quantization at quantization intervals 102. Conventionally, the quantized DCT coefficient indices are always integers. In addition, all bins have the same size, except the outer-most bins from a predetermined cutoff value to infinity. FIG. 1D shows dequantized DCT coefficient 106 after quantized DCT coefficient is dequantized using quantization intervals 102.
The quantizer above is known as a uniform quantizer since all bins have the same size. However, because in transform-based image and video coding, the AC coefficients typically have a sharp concentration around zero in their distribution, a different bin size for the zero bin is needed to improve the rate-distortion performance. A quantizer having a different zero bin width than the bin width for all other bins is known as a deadzone quantizer. A deadzone quantizer therefore is characterized by two parameters, the zero bin width and the outer bin width. In many applications, such as coders based on ISO JPEG, ISO MPEG-2and ITU-T H.263, the bin widths are chosen so that the zero bin width is twice the outer bin width, i.e. zero bin width/outer bin width=2. Although this ratio has been empirically verified to give reasonably good performance across a variety of distributions and images, it is often not the optimal ratio for a particular distribution or image.
S. Mallat and F. Falzon, in a paper entitled xe2x80x9cUnderstanding image transform codes,xe2x80x9d Proc. SPIE Aerospace Conference, April 1997, disclose a fixed ratio of 1.62 for zero bin width over outer bin width. This ratio is obtained for a large class of distributions, and thus is not optimal for a particular distribution or image that an encoder is currently encoding.
It has seldom been studied what is the optimal choice of the zero bin width and the outer bin width for different distributions and/or different images, and furthermore, how this optimal parameter set can be efficiently computed. LoPresto et al. have studied optimal parameter set in a paper entitled xe2x80x9cWavelet Image Coding Using Rate-Distortion Optimized Backward Adaptive Classificationxe2x80x9d (Proc. SPIE Visual Communications and Image Processing, Vol. 3024, p.1026, 1997). In this paper, the wavelet coefficient distribution is approximated with a generalized Gaussian distribution. For each of several types of generalized Gaussian distributions, 500 deadzone quantizers are tabled. For a given coefficient, its distribution is fitted with one of the several pre-selected generalized Gaussian distributions. The 500 deadzone quantizers for the selected generalized Gaussian distribution are then compared by using the Lagrange multiplier method to choose one of the 500 deadzone quantizers. This method requires a multiple of 500 comparisons before the final selection is made. Thus, this method is computationally expensive. More importantly, since only several typical generalized Gaussian distributions are used to fit the actual distribution, the resulting deadzone quantizer is not an optimal one. Therefore, although this technique tunes more specifically to a particular distribution or image than the previous methods of using fixed ratios, e.g., 2 or 1.62, this technique does not solve the general problem of finding the optimal solution for a particular distribution or image.
Without an analytical framework, that is, without the understanding of the relationships among the number of bits required to approximate an image (bit rate), the approximation accuracy (distortion) and the two parameters of a deadzone quantizer, an optimal deadzone quantizer may be found with an exhaustive (brute-force) numerical search which is the only known method to date. For example, if only the distribution is known, various zero bin widths and outer bin widths are plugged into an equation and the results calculated. The zero bin width and outer bin width combination that gives the best result is then selected. If the actual image to be compressed is available, all the calculated results from various zero bin width and outer bin width combinations are tested by actually quantizing the transform coefficients directly. The best zero bin width and outer bin width combination is then obtained by examining the test results.
The number of combinations to be calculated or tested depends on two numbers M and N. More specifically, a zero bin width set containing M candidate values for the zero bin width and an outer bin width set containing N candidate values for the outer bin width are used to find the optimal combination of zero bin width and outer bin width. Hence, the number of possible combinations is MN. In general, numbers M and N are predetermined based on the tradeoffs of accuracy versus cost. For example, the larger the numbers M and N are, the more computations are required for the numerical search, hence, more costly. On the other hand, the lager the numbers M and N are, the more accurate the result because smaller intervals or a greater number of possible candidates may be used. Typical values for M and N are greater than or equal to 1000. It is to be noted that the actual zero bin width is twice the candidate values for the zero bin width in this disclosure.
During the numerical search, for each combination of zero bin width and outer bin width, the associated distortion and entropy are computed. The combination that gives the minimum distortion measure and satisfies the target bit rate, i.e., the amount of storage space or transmission bandwidth available, is then selected as the optimal bin widths. The Lagrange multiplier method, often used to solve the entropy-constrained quantizer design problem and described in the LoPresto article, requires the same MN number of combinations to be computed in searching the optimal combination because each fixed Lagrange multiplier is effectively equivalent to the combinatorial method above.
Due to the complexity and the cost associated with such computational-intensive method, instead of executing MN possible combinations and calculating their errors, entropies, and bin widths, a fixed ratio, e.g., 1.62 or 2.0, is usually used to reduce the number of computations because only one of zero bin width or outer bin width needs to be searched but not both. However, as discussed above, a fixed ratio often does not give the optimum result because it does not apply to the particular distribution and/or the particular image in question.
In general, the bin widths chosen depend on the entropy (i.e. bit rate) required, the entropy being the theoretical average of the bits used to encode the image of a video frame. An entropy constrained quantizer limits the number of bits that can be used to represent particular data but still achieve a high quality image. The constraint is usually determined by the allowable storage space or transmission bandwidth for a particular application.
After the bin widths are selected from the above numerical search method, a quantization value is chosen to compress the data. The quantization value is the number between two boundary numbers where all the numbers that fall within the bin between the boundary values are discretized to. In a practical approach, the middle number between the boundary numbers is selected. For example, the quantization value for numbers that fall within a bin between values 5 and 6 are quantized to 5+(6xe2x88x925)/2=5.5. In a centroid approach, the quantization value is computed as:                     C        =                                            ∫              i              j                        ⁢                                          xp                ⁡                                  (                  x                  )                                            ⁢                              xe2x80x83                            ⁢                              ⅆ                x                                                                        ∫              i              j                        ⁢                                          p                ⁡                                  (                  x                  )                                            ⁢                              xe2x80x83                            ⁢                              ⅆ                x                                                                        (        1        )            
where C is the quantization value; i and j are the boundary numbers of the interval to be quantized; and p is the probability of occurrence of each number x. Conventionally, either the practical approach or the centroid approach is used.
Therefore, what is needed is an efficient method in optimizing the bin widths for a distribution or an image to be compressed, the method capable of adapting to any given quantization value selection approach.
In accordance with the present invention, a method for efficiently optimizing the bin widths for a distribution or an image to be compressed is provided. An image having symmetric uni-modal distribution is divided into a zero bin having a zero bin width and a plurality of outer bins having an outer bin width. M numbers of predetermined candidate values for the zero bin width and N numbers of predetermined candidate values for the outer bin width are provided. A zero bin probability is derived from an entropy function for a target entropy value, i.e. bit rate. The allowable zero bin width is then calculated from the zero bin probability. The allowable zero bin width is then searched to obtain an optimum combination of the zero bin width and the outer bin width, the optimum combination being the combination resulting in the least distortion measure while satisfying the target bit rate.
In one embodiment, the image is a still image. In another embodiment, the image is a video image. In one embodiment, a look up table is generated for the entropy function and the zero probability. The look up table is loaded into a random access memory (RAM) during the search for the optimum combination of the zero bin width and the outer bin width for a given target entropy value. The distortion measure is calculated for each allowable zero bin width and each of the candidate values of outer bin width. The combination that results in the least distortion measure while satisfying the target bit rate is selected as the optimum combination. In one embodiment, a fast algorithm is used to search the outer bin width for a given zero bin width, reducing the required combination by a factor of logN/N. In one embodiment, the set of allowable zero bin width is a subset of M number of candidate values for the zero bin width, thus further reducing computation cost.
A quantization value is then chosen to encode the image. In one embodiment, the middle number of an interval is used as the quantization value. In another embodiment, the centroid of an interval is used as the quantization value.