1. Field of the Invention
The present invention relates to video compression processing, and, in particular, to rate control in a video coder.
2. Description of the Related Art
The primary goal in video compression processing is to reduce the number of bits used to represent sequences of video images while still maintaining an acceptable level of image quality during playback of the resulting compressed video bitstream. Another goal in many video compression applications is to maintain a relatively uniform bit rate, for example, to satisfy transmission bandwidth and/or playback processing constraints.
Video compression processing often involves the tradeoff between bit rate and playback quality. This tradeoff typically involves reducing the average numbers of bits used to encode images in the original video sequence by selectively decreasing the playback quality of each image that is encoded into the compressed video bitstream. Alternatively or in addition, the tradeoff between bit rate and playback quality can involve skipping certain images in the original video sequence, thereby encoding only a subset of those original images into the resulting compressed video bitstream.
Conventional video compression algorithms dictate a regular pattern of image skipping, e.g., skip every other image in the original video sequence. A video encoder may also be able to skip additional images, selected adaptively as needed to satisfy bit rate requirements. The decision to skip one or more images outside of a regular pattern of image skipping should not be made lightly, because of the adverse effects of such non-uniform image skipping to the quality of the video playback. A non-uniform skipping of images can be extremely annoying to a viewer who prefers to see a regular sequence of video images in which images are presented at a uniform frame rate.
As described above, the primary approach for controlling bit rate is to decrease selectively the quality of each image in the encoded video bitstream. One known technique for such rate control relies on a process called segmentation in which each encoded image is analyzed to identify two or more different regions having different levels of importance to the overall quality of the video playback, where those different regions within each frame are themselves treated differently during the encoding process.
For example, the videoconferencing paradigm is a "talking head" centered on a relatively constant background, where "constant" may refer to both time (i.e., background not changing significantly from frame to frame) and space (i.e., background not changing significantly from pixel to pixel within a frame). For such applications, segmentation analysis is performed during the encoding process to identify those regions of each frame that correspond to the talking head and those regions corresponding to the background, where the various regions are specified in terms of macroblocks of pixel data (e.g., each macroblock corresponds to a 16.times.16 block of pixels). When the frame is encoded, the regions corresponding to the talking head are allocated more resources (e.g., more bits per pixel on average) than those regions corresponding to the background, since the talking head is "more important" to the viewer of the video playback.
In many video coding schemes, a transform, such as a two-dimensional discrete cosine transform (DCT) is applied to blocks (e.g., four 8.times.8 blocks per macroblock) of image data (either the pixels themselves or interframe pixel differences corresponding to those pixels). The resulting transform coefficients are then quantized at a selected quantization level where many of the coefficients are typically quantized to a zero value. The quantized coefficients are then run-length and variable-length encoded to generate part of the compressed video bitstream. In general, greater quantization levels result in more DCT coefficients being quantized to zero and fewer bits being required to represent the image data after performing run-length and variable-length encoding.
In a typical encoding scheme that relies on segmentation, the transform coefficients corresponding to those blocks of image data in the more-important regions are less severely quantized than those coefficients corresponding to the less-important regions. In this way, relatively more data (i.e., information) is preserved for the more-important regions than for the less-important regions.
Some video coding schemes limit the magnitude of change in quantization level from frame to frame in a video sequence. Under such a constraint, it may be impossible for conventional video coders to meet imposed bit rate requirements without adaptively dropping additional frames, even those coders that apply different quantization levels to different regions identified by performing segmentation analysis on each image.