It is practically a cliché presently to claim that all electronic communications is engaged in a digital revolution. The main advantage of digital representation of information is the robustness of the bitstream. It can be stored and recovered, transmitted and received, processed and manipulated, all virtually without error. For example, the NTSC color video image has 29.97 frames per second; approximately 480 visible scan lines per frames; and requires approximately 480 pixels per scan line in red, green, and blue color components. However, if each color component is coded using 8 bits, the bitrate produced is ≈168 Megabits per second (Mbits/s). The raw uncompressed bitrates for various video formats are very high and are not economical in many applications.
Digital audio and video signals integrated with computers, telecommunication networks, and consumer products, are poised to fuel the information revolution. At the heart of this revolution is the digital compression of audio and video signals. Several of the compression standards involve algorithms based on a common core of compression techniques, e.g., the ITU-T (formerly CCITT) Recommendation H.261 and ITU-T Recommendation H.263, and the ISO/IEC MPEG-1, MPEG-2 and MPEG-4 standards. The MPEG algorithms were developed by the Moving Picture Experts Group (MPEG), part of a joint technical committee of the International Standards Organization (ISO) and the International Electrotechnical Commission (IEC). The MPEG committee develops standards for the multiplexed, compressed representation of video and associated audio signals. The standards specify the syntax of the compressed bitstream and the method of decoding, but leave considerable latitude for novelty and variety in the algorithm employed in the encoder.
In MPEG, a sequence of video pictures is typically divided into a series of GOPs, where each GOP (Group of Pictures) begins with an Intra-coded picture (I-picture) followed by an arrangement of Forward Predictive-coded pictures (P-pictures) and Bidirectionally Predicted pictures (B-pictures). FIG. 1 illustrates a typical GOP in display order. I-pictures are coded without reference to preceding or upcoming pictures in the sequence. P-pictures are coded with respect to the temporally closest preceding I-picture or P-picture in the sequence. B-pictures are interspersed between the I-pictures and P-pictures in the sequence, and coded with respect to the immediately adjacent I- and P-pictures either preceding, upcoming, or both. Even though several B-pictures may occur in immediate succession, B-pictures may never be used to predict another picture.
Each picture has three components: luminance (Y), red color difference (Cr), and blue color difference (Cb). For an MPEG-2 4:2:0 format, the Cr and Cb components each have half as many samples as the Y component in both horizontal and vertical directions. As depicted in FIG. 2, the basic building block of an MPEG picture is the macroblock (MB). For 4:2:0 video, each MB consists of a 16×16 sample array of luminance samples together with one 8×8 block of samples for each of two color difference components. The 16×16 sample array of luminance samples is actually composed of four 8×8 blocks of samples.
It is the responsibility of an encoder to decide which picture coding type and which prediction mode is best. In an I-picture, each 8×8 block of pixels in a MB undergoes a discrete cosine transform (DCT) to form a 8×8 array of transform coefficients. The transform coefficients are then quantized with a quantizer matrix. The resulting quantized DCT coefficients are zig-zag scanned to form a sequence of DCT coefficients. The sequence of DCT coefficients are then encoded using a variable length code (VLC). In a P-picture, a decision is made to code each MB as an I macroblock, or to code the MB as a P macroblock. The I macroblock is encoded according to the technique described above. For each P macroblock, a prediction of the macroblock in a preceding picture is obtained. The prediction is identified by a motion vector indicating the translation between the macroblcok to be coded in the current picture and its prediction in the previous picture. The predictive error between the predictive macroblock and the current macroblock is then coded using the DCT, quantization, zig-zag scanning, and VLC encoding.
In the encoding of a B-picture, a decision has to be made as to the coding of each MB. There are four macroblock modes, intra (I) mode, forward (F) mode, backward (B) mode, and interpolative forward-backward (FB) mode. I mode is intracoding using no motion compensation (as in an I macroblock). F mode is unidirectional forward predictive coding using a previous picture to obtain a motion compensated prediction (as in a P macroblock). Conversely, B mode is unidirectional backward predictive coding using a subsequent picture to obtain a motion compensated prediction. In particular, FB mode is bidirectional predictive coding, wherein a motion compensated prediction is obtained by interpolating a backward motion compensated prediction and a forward motion compensated prediction. In the cases of F, B and FB macroblock modes, the predictive error is encoded using the DCT, quantization, zig-zag scanning, and VLC encoding.
An important aspect of any video encoder is rate control. The purpose of rate control is to maximize the perceptual quality of the encoded video when it is decoded at a decoder by intelligently allocating the number of bits used to encode each picture and each MB within a picture. The encoder must choose quantization step sizes for an entire picture so as to control visible distortion for a given bitrate. Note that the actual bits used for encoding a picture with chosen quantization step sizes are unknown until the picture is actually coded. There does not exist an inverse function that can determine the actual used bits of a picture by simply given desired quantization step sizes. While a key feature of MPEG is the use of adaptive (or variable) quantization, this technique permits different regions of each picture to be coded with varying degrees, thereby achieving uniform perceptual quality over each picture and from picture to picture. Nevertheless, conventional methods for rate control are relatively complex, typically requiring multiple passes to accomplish video encoding. Apart from the above problem, prior arts lack a simple mechanism to assign initial quantization step sizes for the adaptive quantization keeping picture quality more uniform.
Accordingly, what is needed is a novel rate control technique for a single-pass, real time video encoder. Further, it is desired to provide a method and apparatus for rate control in moving picture compression using bit allocation with initial quantization step size estimation at picture level.