The present invention relates to video compression processing, and, in particular, to the selection of quantization levels used to quantize DCT coefficients during MPEG video encoding.
MPEG refers to a family of video compression standards promulgated by the Motion Picture Experts Group. According to the MPEG standards, the frames of a video sequence may be encoded as either I, P, or B frames. An I frame is intra-encoded without reference to any other frames, while P and B frames are inter-encoded based on inter-frame pixel differences to exploit the temporal redundancy that typically exists between frames of a video sequence. I and P frames can be used as reference frames for inter-encoding other P or B frames, while B frames are never used as reference frames for inter-encoding other frames.
FIG. 1 shows a block diagram of the intra-encoding performed for I frames, according to the MPEG standards. As shown in FIG. 1, a block-based discrete cosine transform (DCT) is applied to each (8xc3x978) block of pixels in the current frame to generate blocks of DCT coefficients, which represent the image data in a spatial frequency domain (block 102). Each block of DCT coefficients is then quantized based on selected quantization levels (block 104), and the resulting quantized DCT coefficients are then run-length encoded (block 106) and Huffman (variable-length) encoded (block 108) to generate the current frame""s contribution to the encoded video bitstream.
FIG. 2 shows a block diagram of the inter-encoding performed for P and B frames, according to the MPEG standards. As shown in FIG. 2, motion estimation is performed for each (16xc3x9716) macroblock of pixels in the current frame to identify a closely matching set of pixel data corresponding to one or more reference frames (block 202). Motion compensation is then performed based on the motion vectors determined during the motion estimation processing of block 202 to determine the motion-compensated pixel-to-pixel inter-frame differences for each macroblock in the current frame (block 204). A DCT transform is then applied to each (8xc3x978) block of inter-frame pixel differences in the current frame to generate blocks of DCT coefficients (block 206). Each block of DCT coefficients is then quantized based on selected quantization levels (block 208), and the resulting quantized DCT coefficients are then run-length encoded (block 210) and Huffman encoded (block 212) to generate the current frame""s contribution to the encoded video bitstream. Note that the encoding of the motion vectors determined during the motion estimation processing of block 202 is not represented in FIG. 2, but is part of the MPEG video compression processing for P and B frames.
The MPEG standards provide two different quantization scales that define different sets of quantization levels that are available for selection for use during the quantization processing of either block 104 in the intra-encoding algorithm shown in FIG. 1 or block 208 in the inter-encoding algorithm shown in FIG. 2: a linear quantization scale and a non-linear quantization scale. The linear quantization scale is typically used for decoding MPEG-compliant bitstreams. The linear quantization scale defines a set of 31 quantization levels that range from 2 to 62 in increments of 2, while the non-linear quantization scale defines a set of 31 quantization levels that range from 1 to 112 as follows: 1 to 8 in increments of 1, 8 to 24 in increments of 2, 24 to 56 in increments of 4, and 56 to 112 in increments of 8.
In most MPEG-compliant video compression algorithms, quantization level is the primary encoding parameter used to trade-off between bit rate and picture quality of the decoded video sequence during playback of the encoded video bitstream. In general, both bit rate and picture quality are inversely proportional to quantization level. Lower bit rates can typically be achieved by using higher quantization levels, but at the expense of lower picture quality. On the other hand, higher picture quality can typically be achieved by using lower quantization levels, but at the expense of higher bit rates.
MPEG-compliant video compression algorithms enable users to carefully select quantization levels to trade-off between bit rate and picture quality to meet particular application requirements. In some applications, such as real-time video conferencing over plain old telephone service (POTS) lines, picture quality is often sacrificed in order to achieve low bit rates. In these applications, relatively high quantization levels are typically used. In other applications, such as video compression for non-real-time playback where higher bit rates are acceptable, relatively low quantization levels can be used to achieve high picture quality during video playback.
Before selecting the specific quantization levels to use for different blocks of DCT coefficients, an MPEG encoder must first decide which quantization scale to use. As mentioned earlier, either the linear or the non-linear quantization scale can be chosen for MPEG-compliant decoding. The MPEG standards allow an encoder to change quantization scale from frame to frame during video compression processing. The selection between the linear and non-linear quantization scales can greatly affect the ability of the MPEG encoder to trade-off efficiently between bit rate and picture quality to achieve its application-specific performance requirements.
In general, the linear quantization scale allows medium grain control at 31 equally spaced quantization values. The non-linear quantization scale offers 31 quantization values having a broader range with finer granularity at the lower end and coarser granularity at the higher end. When using the linear scale, in some situations when the required quantization level is beyond the range provided, serious degradation in the resulting compressed image may occur, because the encoder is forced to throw away information in order to stay within the bit allocation. The non-linear quantization scale offers the encoder more latitude in avoiding this degradation. However, the coarser granularity of the high non-linear quantization levels may introduce image artifacts such as blockiness caused by large quantization discontinuities at macroblock boundaries. Thus, the linear quantization scale is generally better at reducing artifacts within a given ranges of quantization levels, while the non-linear quantization scale is generally better outside of this range.
The present invention is directed to a technique for adaptively selecting between different quantization scales during video compression processing. For example, for MPEG encoding, the present invention may be applied to adaptively select between the linear quantization scale and the non-linear quantization scale used during video compression processing to select the specific quantization levels for quantizing DCT coefficients.
According to one embodiment, the present invention is a method for encoding frames of a video sequence, comprising the steps of (a) generating a metric characterizing quantization levels corresponding to a set of image data in the video sequence; (b) comparing the metric to one or more specified thresholds to select a quantization scale for a current frame in the video sequence; and (c) encoding the current frame using the selected quantization scale.