The increasing development of digital video technology presents an ever increasing problem of reducing the high cost of video compression codecs (coder/decoder) and resolving the inter-operability of equipment of different manufacturers. To achieve these goals, the Moving Picture Experts Group (MPEG) created the ISO/IEC international Standards 11172 (1991) (generally referred to as MPEG-1 format) and 13818 (1995) (generally referred to as MPEG-2 format), which are incorporated herein in their entirety by reference. One goal of these standards is to establish a standard coding/decoding strategy with sufficient flexibility to accommodate a plurality of different applications and services such as desktop video publishing, video conferencing, digital storage media and television broadcast.
Although the MPEG standards specify a general coding methodology and syntax for generating a MPEG compliant bitstream, many variations are permitted in the values assigned to many of the parameters, thereby supporting a broad range of applications and interoperability. In effect, MPEG does not define a specific algorithm needed to produce a valid bitstream. Furthermore, MPEG encoder designers are accorded great flexibility in developing and implementing their own MPEG-specific algorithms in areas such as image pre-processing, motion estimation, coding mode decisions, scalability, and rate control. This flexibility fosters development and implementation of different MPEG-specific algorithms, thereby resulting in product differentiation in the marketplace. However, a common goal of MPEG encoder designers is to minimize subjective distortion for a prescribed bit rate and operating delay constraint.
In the area of rate control, MPEG does not define a specific algorithm for controlling the bit rate of an encoder. It is the task of the encoder designer to devise a rate control process for controlling the bit rate such that the decoder input buffer neither overflows nor underflows. A fixed-rate channel is assumed to carry bits at a constant rate to an input buffer within the decoder. At regular intervals determined by the picture rate, the decoder instantaneously removes all the bits for the next picture from its input buffer. If there are too few bits in the input buffer, i.e., all the bits for the next picture have not been received, then the input buffer underflows resulting in an error. Similarly, if there are too many bits in the input buffer, i.e., the capacity of the input buffer is exceeded between picture starts, then the input buffer overflows resulting in an overflow error. Thus, it is the task of the encoder to monitor the number of bits generated by the encoder, thereby preventing the overflow and underflow conditions.
Currently, one way of controlling the bit rate is to alter the quantization process, which will affect the distortion of the input video image. By altering the quantizer scale (step size), the bit rate can be changed and controlled. To illustrate, if the buffer is heading toward overflow, the quantizer scale should be increased. This action causes the quantization process to reduce additional Discrete Cosine Transform (DCT) coefficients to the value "zero", thereby reducing the number of bits necessary to code a macroblock. This, in effect, reduces the bit rate and should resolve a potential overflow condition. However, if this action is not sufficient to prevent an impending overflow then, as a last resort, the encoder may discard high frequency DCT coefficients and only transmit low frequency DCT coefficients. Although this drastic measure will not compromise the validity of the coded bitstream, it will produce visible artifacts in the decoded video image.
Conversely, if the buffer is heading toward underflow, the quantizer scale should be decreased. This action increases the number of non-zero quantized DCT coefficients, thereby increasing the number of bits necessary to code a macroblock. Thus, the increased bit rate should resolve a potential underflow condition. However, if this action is not sufficient, then the encoder may insert stuffing bits into the bitstream, or add leading zeros to the start codes. These stuffing bits will be removed by the decoder, but the decoded picture may possess blockiness which is due to coding the picture too coarsely with a large quantizer scale.
Although changing the quantizer scale is an effective method of implementing the rate control of an encoder, it has been shown that a poor rate control process will actually degrade the visual quality of the video image, i.e., failing to alter the quantizer scale in an efficient manner such that it is necessary to drastically alter the quantizer scale toward the end of a picture to avoid overflow and underflow conditions. Since altering the quantizer scale affects both image quality and compression efficiency, it is important for a rate control process to control the bit rate without sacrificing image quality.
A second method of implementing rate control is to set the quantizer scale to a constant for the entire picture of the video image. This method simplifies the rate control process at the expense of image quality. If the quantizer scale is set to a constant, the variance of the quantization noise is typically constant. The quantization noise is the difference between the actual value and the quantized value. Thus, if the quantizer scale is kept constant over a picture, then the total mean square error of the coded picture tends to be close to the minimum, for a given number of coding bits.
However, the human visual system (HVS) is more sensitive to certain quantization noise than others. Namely, not all spatial information is perceived alike by the human visual system and some macroblocks within a picture need to be coded more accurately than others. This is particularly true of macroblocks corresponding to very smooth gradients where a very slight inaccuracy will be perceived as a visible macroblock boundary (known as blocking effect). Thus, the visual appearance of most pictures can be improved by varying the quantizer scale over the entire picture, i.e., lowering the quantizer scale in smooth areas of the picture and increasing it in "busy" areas. This technique should reduce the visibility of blockiness in smooth areas at the expense of increasing the quantization noise in the busy area where the noise is hidden by the image detail.
In the current MPEG coding strategies (e.g., Test Models 4 and 5 (TM4 and TM5)), the quantizer scale for each macroblock is selected by assuming that all the pictures of the same type have identical complexity within a group of pictures. Namely, after a picture of a certain type (I, P, or B) is encoded, TM4 and TM5 use the result of the encoding to establish the complexity of each type of picture. Complexity is a measure of the amount of bits necessary to code the content of a picture at a particular quantizer scale. Thus, TM4 and TM5 use the estimated complexity to derive a bit budget (picture target bits) for each picture which, in turn, is used to select an appropriate quantizer scale to meet this bit budget. However, the quantizer scale selected by this criterion may not achieve optimal coding performance, since the complexity of each picture will vary with time.
Furthermore, encoders that utilize global-type transforms have similar problems. For example, one such global-type compression technique appears in the Proceedings of the International Conference on Acoustics, Speech and Signal Processing, San Francisco, Calif. March 1992, volume IV, pages 657-660, where there is disclosed a signal compression system which applies a hierarchical subband decomposition, or wavelet transform, followed by the hierarchical successive approximation entropy-coded quantizer incorporating zerotrees. The representation of signal data using a multiresolution hierarchical subband representation was disclosed by Burt et al. in IEEE Trans. on Commun., Vol Com-31, No. 4, April 1983, page 533. A wavelet pyramid, also known as critically sampled quadrature-mirror filter (QMF) subband representation, is a specific type of multiresolution hierarchical subband representation of an image. A wavelet pyramid was disclosed by Pentland et al. in Proc. Data Compression Conference Apr. 8-11, 1991, Snowbird, Utah. A QMF subband pyramid has been described in "Subband Image Coding", J. W. Woods ed., Kluwer Academic Publishers, 1991 and I. Daubechies, Ten Lectures on Wavelets, Society for Industrial and Applied Mathematics (SIAM): Philadelphia, Pa., 1992.
Wavelet transforms, otherwise known as hierarchical subband decomposition, have recently been used for low bit rate image compression because such decomposition leads to a hierarchical multi-scale representation of the source image. Wavelet transforms are applied to an important aspect of low bit rate image coding: the coding of a binary map (a wavelet tree) indicating the locations of the non-zero values, otherwise known as the significance map of the transform coefficients. Using scalar quantization followed by entropy coding, in order to achieve very low bit rates, i.e., less than 1 bit/pel, the probability of the most likely symbol after quantization--the zero symbol--must be extremely high. Typically, a large fraction of the bit budget must be spent on encoding the significance map. It follows that a significant improvement in encoding the significance map (the wavelet tree) translates into a significant improvement in the compression of information preparatory to storage or transmission.
U.S. Pat. 5,412,741 issued May 2, 1995 and herein incorporated by reference discloses an apparatus and method for encoding information with a high degree of compression. The apparatus uses so-called zerotree coding of wavelet coefficients in a much more efficient manner than any previous techniques. The key to this apparatus is the dynamic generation of the list of coefficient indices to be scanned, whereby the dynamically generated list only contains coefficient indices for which a symbol must be encoded. This is a dramatic improvement over the prior art in which a static list of coefficient indices is used and each coefficient must be individually checked to see whether a) a symbol must be encoded, or b) it is completely predictable.
The apparatus uses a method for encoding information comprising the steps of forming a wavelet transform of the image, forming a zerotree map of the wavelet coefficients, encoding the significant coefficients on an initial dominant list from the coarsest level of the transform and the children of those coefficients whose indices are appended to the dominant list as the coefficient of the parent is found to be significant, reducing the threshold, refining the estimate of the value of the significant coefficients to increase the accuracy of the coded coefficients, and cycling back to scan the dominant list anew at the new, reduced threshold.
To accomplish the iterative process, the method of the prior art is accomplished by scanning the wavelet tree breadth first pattern, i.e., all parent nodes are coded, then all children, then all grandchildren and so on. As the process iterates through the wavelet tree representation of the image, this apparatus codes one of four symbols within the zerotree map.
The output bit stream from a video encoder tends to have a variable bit rate that fluctuates according to scene contents and the nature of the coding process used by the encoder. In practical applications for encoders, the communication channel through which the coded data is to be transmitted is generally a constant capacity channel. As such, the encoder requires a mechanism to regulate the output bit rate to match the channel rate with minimum loss of signal quality.
Heretofore, encoders that utilize global-type transforms such as wavelet transforms have special requirements that are not met by the prior are rate control techniques.
Therefore, a need exists in the art for an apparatus and method that recursively adjusts the quantizer scale for each macroblock to maintain the overall quality of the video image while optimizing the coding rate.