1. Field of the Invention
The present invention relates to image processing, and, in particular, to video compression algorithms that involve quantization.
2. Description of the Related Art
In some block transform-based video encoding schemes, each component plane of each video frame is divided into (8.times.8) blocks. A block transform such as a discrete cosine transform (DCT) or slant transform is then applied either to the pixels of each block or to interframe pixel differences corresponding to each block. The resulting transform coefficients are then quantized for run-length encoding of the quantized coefficients followed by variable-length encoding of the run-length codes.
The quantization of the transform coefficients may be based on the selection of one or more quantization (Q) levels from a set of possible Q levels. Each Q level is associated with a Q table, an array of divisors that are used to quantize the transform coefficients. A fine (i.e., low) Q level is associated with a Q table having small divisors, while a coarse (i.e., high) Q level is associated with a Q table having relatively large divisors.
The selection of Q level typically has the opposite affect on two important goals of video compression: bitrate and video quality. In many video compression applications, it is desirable to have as low a bitrate as possible and as high a video quality as possible. Lower Q levels preserve more detail in the encoded video resulting in higher video quality, but at the cost of higher bitrates. Higher Q levels quantize the video more severely providing lower bitrates, but at reduced image quality.
In some video compression applications, there are transmission bandwidth limitations that require the encoded video bitstream to meet specific target bitrates. If the selected Q level is too low, the encoded video bitstream will exceed the target bitrate. Selecting a high Q level will ensure that the target bitrate is met, but if the Q level is too high, the target bitrate will be undershot resulting in lower video quality than if the encoded video bitstream were roughly equal to the target bitrate.
One way to achieve optimal bitrate is to make an initial best guess of Q level for the current frame and fully encode the frame at that Q level. If the resulting encoded frame size is too big, then a higher Q level is selected and the encoding process is repeated. If the resulting encoded frame size is too low, then a lower Q level is selected and again the encoding process is repeated. The iterative selection and encoding process is repeated until the target bitrate is met with as low a Q level as possible. Unfortunately, this iterative, brute-force approach is computationally expensive and may be impractical for some applications, for example, in video conferencing in which video frames must be captured, encoded, and transmitted in real time to one or more remote conference participants.
Alternatively, "open-loop" methods of selecting Q level exist which are more practical for real-time video encoding. One approach is to base the selection of Q level on a measure of the energy of the current frame. For example, a mean-square-error (MSE) or sum-of-absolute-differences (SAD) measure can be generated for each block of inputs to the block transform. This measure can then be compared to a model that predicts the bitrate associated with the statistical measure for each of the different Q levels. The model may be generated empirically off line using video sequences similar to those expected during real-time processing.
Such open-loop methods can be successfully used to attain an average bitrate equal to the target bitrate. Unfortunately, the statistical measures of the energy of the pixel data are not consistently good predictors of the size of the encoded frame. As a result, attaining the target bitrate "on the average" means that the target bitrate is exceeded either often by a little bit or infrequently by a lot. In either case, the quality of the decoded video is degraded. When the target bitrate is exceeded in one frame, it must be undershot in the following frame or frames to make up for the excess number of bits. This is achieved by raising the Q level for the subsequent frames, resulting in reduced image quality in those frames and uneven image quality in the overall video sequence as the Q level oscillates from frame to frame. Setting a target bitrate lower than the available transmission bandwidth helps reduce the number of times that the "true" bitrate limit is exceeded, but at the cost of lower overall video quality.
What is needed is a process for selecting quantization levels for encoding video sequences in real time that more accurately and more consistently achieves the target bitrate to provide an optimal balance between the conflicting goals of low bitrate and high video quality.
It is accordingly an object of this invention to overcome the disadvantages and drawbacks of the known art and to provide an approach to the selection of quantization levels that more accurately and more consistently achieves a specified target bitrate.
Further objects and advantages of this invention will become apparent from the detailed description of a preferred embodiment which follows.