1. Field of the Invention
The invention relates to the field of data compression, and more particularly to a system capable of performing variable bit rate control for a video encoder.
2. Description of the Related Art
It is practically a cliche presently to claim that all electronic communications is engaged in a digital revolution. The main advantage of digital representation of information is the robustness of the bitstream. It can be stored and recovered, transmitted and received, processed and manipulated, all virtually without error. For example, the NTSC color video image has 29.97 frames per second, approximately 480 visible scan lines per frame and requires approximately 480 pixels per scan line in red, green, and blue color components. However, if each color component is coded using 8 bits, the bitrate produced is ≈168 Megabits per second (Mbits/s). The raw uncompressed bitrates for various video formats are very high and are not economical in many applications.
Digital audio and video signals integrated with computers, telecommunication networks, and consumer products, are poised to fuel the information revolution. At the heart of this revolution is the digital compression of audio and video signals. Several of the compression standards involve algorithms based on a common core of compression techniques, e.g., the ITU-T (formerly CCITT) Recommendation H.261 and ITU-T Recommendation H.263, and the ISO/IEC MPEG-1, MPEG-2 and MPEG-4 standards. The MPEG algorithms were developed by the Moving Picture Experts Group (MPEG), part of a joint technical committee of the International Standards Organization (ISO) and the International Electrotechnical Commission (IEC). The MPEG committee develops standards for the multiplexed, compressed representation of video and associated audio signals. The standards specify the syntax of the compressed bitstream and the method of decoding, but leave considerable latitude for novelty and variety in the algorithm employed in the encoder.
In MPEG, a sequence of video pictures is typically divided into a series of GOPs, where each GOP (Group of Pictures) begins with an Intra-coded picture (I-picture) followed by an arrangement of Forward Predictive-coded pictures (P-pictures) and Bidirectionally Predicted pictures (B-pictures). FIG. 1 illustrates a typical GOP in display order. I-pictures are coded without reference to preceding or upcoming pictures in the sequence. P-pictures are coded with respect to the temporally closest preceding I-picture or P-picture in the sequence. B-pictures are interspersed between the I-pictures and P-pictures in the sequence, and coded with respect to the immediately adjacent I- and P-pictures either preceding, upcoming, or both. Even though several B-pictures may occur in immediate succession, B-pictures may never be used to predict another picture.
Each picture has three components: luminance (Y), red color difference (Cr), and blue color difference (Cb). For an MPEG-2 4:2:0 format, the Cr and Cb components each have half as many samples as the Y component in both horizontal and vertical directions. As depicted in FIG. 2, the basic building block of an MPEG picture is the macroblock (MB). For 4:2:0 video, each MB consists of a 16×16 sample array of luminance samples together with one 8×8 block of samples for each of two color difference components. The 16×16 sample array of luminance samples is actually composed of four 8×8 blocks of samples.
It is the responsibility of an encoder to decide which picture coding type and which prediction mode is best. In an I-picture, each 8×8 block of pixels in a MB undergoes a discrete cosine transform (DCT) to form an 8×8 array of transform coefficients. The transform coefficients are then quantized with a quantizer matrix. The resulting quantized DCT coefficients are zig-zag scanned to form a sequence of DCT coefficients. The sequence of DCT coefficients is then encoded using a variable length code (VLC). In a P-picture, a decision is made to code each MB as an I macroblock, or to code the MB as a P macroblock. The I macroblock is encoded according to the technique described above. For each P macroblock, a prediction of the macroblock in a preceding picture is obtained. The prediction is identified by a motion vector indicating the translation between the macroblcok to be coded in the current picture and its prediction in the previous picture. The predictive error between the predictive macroblock and the current macroblock is then coded using the DCT, quantization, zig-zag scanning, and VLC encoding.
In the encoding of a B-picture, a decision has to be made as to the coding of each MB. There are four macroblock modes, intra (I) mode, forward (F) mode, backward (B) mode, and interpolative forward-backward (FB) mode. I mode is intracoding using no motion compensation (as in an I macroblock). F mode is unidirectional forward predictive coding using a previous picture to obtain a motion compensated prediction (as in a P macroblock). Conversely, B mode is unidirectional backward predictive coding using a subsequent picture to obtain a motion compensated prediction. In particular, FB mode is bidirectional predictive coding, wherein a motion compensated prediction is obtained by interpolating a backward motion compensated prediction and a forward motion compensated prediction. In the cases of F, B and FB macroblock modes, the predictive error is encoded using the DCT, quantization, zig-zag scanning, and VLC encoding.
The encoder must choose quantization step sizes for an entire picture so as to control visible distortion for a given bitrate. Note that the actual bits used for encoding a picture with chosen quantization step sizes are unknown until the picture is actually coded. There is no inverse function that can determine the actual used bits of a picture by simply given desired quantization step sizes. Therefore, an important aspect of any video encoder is rate control. MPEG has an important encoder restriction, namely a limitation on the variation in bits/picture, especially in the case of constant bitrate operation. This limitation is enforced through a Video Buffer Verifier (VBV). If the VBV input data rate is the same for each picture, then the video is said to be coded at Constant Bitrate (CBR). Otherwise, the video is said to be coded at Variable Bitrate (VBR). The VBV buffer is a virtual buffer and is a model of the input buffer at the decoder. The encoder allocates bits to pictures such that VBV buffer does not overflow or underflow in the case of CBR encoding. For VBR operation, the coded bitstream enters the VBV buffer at a specified maximum bitrate until the buffer is full, when no more bits are input. This translates to a bitrate entering the VBV buffer that may be effectively variable, up to the maximum bitrate. In the case of the VBR encoding, it is only necessary to prevent VBV underflow.
The intent of VBR control is to maximize the perceptual quality of the decoded pictures while maintaining the output bitrate within permitted bounds. Unlike the CBR scheme, VBR possesses more flexibility to allocate additional bits to the pictures having complex scenery and/or high motion. This latitude can be used to equalize the perceptual quality of the resulting reconstructed pictures. In some applications like digital cinema, the priority is for consistent quality from picture to picture, before any requirement for fixed bandwidth. Video encoding for such applications should use a VBR, constant quality mechanism for rate control. Nevertheless, conventional schemes for constant quality rate control are relatively complex, typically requiring multiple passes to accomplish video encoding. Accordingly, what is needed is a novel constant quality rate control technique for a single-pass, real time video encoder. It is also desirable to provide a frame-based rate control apparatus for constant quality video encoding, which is suitable for integrated circuit implementation.