1. Field of the Invention
The present invention relates to image processing, and, in particular, to video compression.
2. Description of the Related Art
The goal of video compression processing is to encode image data to reduce the number of bits used to represent a sequence of video images while maintaining an acceptable level of quality in the decoded video sequence. This goal is particularly important in certain applications, such as videophone or video conferencing over POTS (plain old telephone service) or ISDN (integrated services digital network) lines, where the existence of limited transmission bandwidth requires careful control over the bit rate, that is, the number of bits used to encode each image in the video sequence. Furthermore, in order to satisfy the transmission and other processing requirements of a video conferencing system, it is often desirable to have a relatively steady flow of bits in the encoded video bitstream. That is, the variations in bit rate from image to image within a video sequence should be kept as low as practicable.
Achieving a relatively uniform bit rate can be very difficult, especially for video compression algorithms that encode different images within a video sequence using different compression techniques. Depending on the video compression algorithm, images may be designated as the following different types of frames for compression processing:
An intra (I) frame which is encoded using only intra-frame compression techniques,
A predicted (P) frame which is encoded using inter-frame compression techniques based on a previous I or P frame, and which can itself be used as a reference frame to encode one or more other frames,
A bi-directional (B) frame which is encoded using bi-directional inter-frame compression techniques based on a previous I or P frame, a subsequent I or P frame, or a combination of both, and which cannot itself be used to encode another frame, and
A PB frame which corresponds to two imagesxe2x80x94a P frame and a subsequent B framexe2x80x94that are encoded as a single frame (as in the H.263 video compression algorithm). Depending on the actual image data to be encoded, these different types of frames typically require different numbers of bits to encode. For example, I frames typically require the greatest numbers of bits, while B frames typically require the least number of bits.
In a typical transform-based video compression algorithm, a block-based transform, such as a discrete cosine transform (DCT), is applied to blocks of image data corresponding either to pixel values or pixel differences generated, for example, based on a motion-compensated inter-frame differencing scheme. The resulting transform coefficients for each block are then quantized for subsequent encoding (e.g., run-length encoding followed by variable-length encoding). The degree to which the transform coefficients are quantized directly affects both the number of bits used to represent the image data and the quality of the resulting decoded image. This degree of quantization is also referred to as the quantization level, which is often represented by a specified quantizer value that is used to quantize all of the transform coefficients. In some video compression algorithms, the quantization level refers to a particular table of quantizer values that are used to quantize the different transform coefficients, where each transform coefficient has its own corresponding quantizer value in the table. In general, higher quantizer values imply more severe quantization and therefore fewer bits in the encoded bitstream at the cost of lower playback quality of the decoded images. As such, the quantizer is often used as the primary variable for controlling the tradeoff between bit rate and image quality.
Visual quality of video depends not only on global measures (like pixel signal-to-noise ratio (PSNR)), but also on how the error is distributed in space and time. Thus, it is important to maintain spatial smoothness of the quantizer (which is closely related to the local distortion) across the picture. In fact, in many scenes, the ideal quantizer selection is a uniform value across the scene. However, such a scheme will not support the moving of bits to a more-important region from less-important regions, and furthermore, will provide very little control over the bits used to encode the picture. Thus, it cannot be used in constant (or near-constant) bit-rate applications (like videophone and video-conferencing over POTS or ISDN).
The other possibility is to vary the quantizer from macroblock to macroblock within the constraints of the coding standard being used (for example, in H.263, the quantizer value can change by a value of at most 2 in either direction from one macroblock to the next when following a raster scan pattern through the image). Examples of such schemes are given in the H.263+TMN8 (Test Model Near-Term 8) and TMN9 documents (see, e.g., ITUxe2x80x94Telecommunications Standardization Sector, xe2x80x9cVideo Codec Test Model, Near-Term, Version 9 (TMN9)xe2x80x9d, Document Q15-C-15, December 1997). In these schemes, while the frame-level bit target can be accurately met, there are many, possibly large quantizer changes, both spatially and temporally, which show up annoyingly in the moving video as undesirable artifacts.
The present invention is directed to a technique for assigning quantization levels (e.g., quantizer values) used during video compression processing. According to the present invention, an image is segmented into one or more different regions, and a temporal prediction model is separately applied to each region to assign a quantization level to be used to quantize the transform coefficients for the macroblocks of that region. Because the temporal prediction model is applied separately to each region, a different quantization level may bexe2x80x94but does not have to bexe2x80x94assigned to each different region.
According to one embodiment, the present invention is a method for encoding a current frame in a video sequence, comprising the steps of (a) segmenting the current frame into one or more different regions; (b) generating an encoding complexity measure for each corresponding region of a previously encoded frame in the video sequence; (c) using the encoding complexity measure for each region of the previous frame to select a quantization level for the corresponding region of the current frame; and (d) encoding the current frame using the one or more selected quantization levels.