The invention relates to video compression, and, more particularly, to motion picture experts group (MPEG) video compression.
Video communication (television, teleconferencing, and so forth) typically transmits a stream of video frames (pictures) along with audio over a transmission channel for real time viewing and listening by a receiver. However, transmission channels frequently add corrupting noise and have limited bandwidth (e.g., television channels limited to 6 MHz). Consequently, digital video transmission with compression enjoys widespread use. In particular, various standards for compression of digital video have emerged and include H. 261, MPEG-1, and MPEG-2, with more to follow, including in development H.263 and MPEG4. There are similar audio compression methods such as CELP and MELP.
Tekalp, Digital Video Processing (Prentice Hall 1995), Clarke, Digital Compression of Still Images and Video (Academic Press 1995), and Schafer et al, Digital Video Coding Standards and Their Role in Video Communications, 83 Proc. IEEE 907 (1995), include summaries of various compression methods, including descriptions of the H.261, MPEG-1, and MPEG-2 standards plus the H.263 recommendations and indications of the desired finctionalities of MPEG-4. These references and all other references cited are hereby incorporated by reference.
H.261 compression uses interframe prediction to reduce temporal redundancy and discrete cosine transform (DCT) on a block level together with coefficient quantization plus high spatial frequency cutoff to reduce spatial redundancy. H.261 is recommended for use with transmission rates in multiples of 64 Kbps (kilobits per second) to 2 Mbps (megabits per second).
The H.263 recommendation is analogous to H.261 but for bitrates of about 22 Kbps (twisted pair telephone wire compatible) and with motion estimation at half-pixel accuracy (which eliminates the need for loop filtering available in H.261) and overlapped motion compensation to obtain a denser motion field (set of motion vectors) at the expense of more computation and adaptive switching between motion compensation with 8 pixel by 8 pixel blocks and macroblocks (four 8 by 8 luminance blocks plus spatially associated chroma blocks).
MPEG-1 and MPEG-2 also use temporal prediction followed by two dimensional DCT transformation on a block level similar to H261, but they make further use of various combinations of motion-compensated prediction, interpolation, and intraframe coding. MPEG-1 aims at video CDs and works well at rates about 1-1.5 Mbps for frames of about 360 pixels by 240 lines and 24-30 frames per second. MPEG-1 defines I, P, and B pictures with I pictures intraframe, P pictures coded using motion-compensation prediction from previous I or P pictures, and B pictures using motion-compensated bi-directional prediction/interpolation from adjacent I and P pictures. FIG. 2 indicates the prediction within a group of consecutive pictures (GOP).
MPEG-2 aims at digital television (720 pixels by 480 lines) and typically uses bitrates up to about 10 Mbps with MPEG-1 type motion compensation with I, P, and B pictures plus adds scalability (a lower bitrate may be extracted to transmit a lower resolution image).
The number of bits per picture in a MPEG video bit stream is variable, so a rate control strategy is needed in the encoder to regulate the bit rate so that a constant-size buffer in the decoder can receive the bitstream through a fixed rate channel without overflow or underflow. The basic function of such a rate control scheme is to dynamically regulate the coding parameters so as to match the number of coded bits with the channel capacity available. In MPEG, rate control is achieved through the manipulation of a macroblock-level (MB-level) coding parameter called quantization scale (Q or MQUANT). This parameter, which takes any integer value in the range 1 to 31 (or an alternative set of integers and half integers from 0.5 to 56), is used for scaling the quantization matrix of the DCT representation of either the MB or its pixel-difference with a predicted MB, depending upon the selected MB coding type. FIG. 3a illustrates functional blocks for an encoder and a decoder; these may be implemented in digital signal processors or general processors with sufficient computational power. For the 8-bit levels and an 8 by 8 block, the DCT coefficients will have values in the range from -2048 to +2047. FIG. 3b shows the default intra quantization matrix with the dc coefficient in the upper lefthand corner; that is, a DCT coefficient is divided by the corresponding matrix entry and Q and then rounded-off to the nearest integer. A high Q will reduce the number of bits needed to code the MB at the expense of degradation in the quality of the reconstructed MB, and vice versa. Hence, the technique used to adjust Q has a direct effect on the resultant quality of the encoded picture. Ideally, the value of Q for each MB should be selected so that good and stable picture quality is attained and the buffer capacity constraint is observed.
The MPEG test model encoder (TM5) performs rate control via bit assignment at two hierarchical levels (or layers) of a bitstream: the picture-level and the MB-level. Prior to coding a picture, the encoder uses statistics obtained form coding a previous picture of the same type (i.e., I, P, or B) and the number of bits available for coding the remaining pictures in the current group of pictures (GOP) to determine the target number of bits to be used to code the current picture. Hence, the number of bits to code a picture is set a priori without looking at the content of the picture. At the MB-level the value of Q is set on-the-fly by the product of two parameters: a reference Q which depends upon the fullness of the virtual buffer just prior to coding the MB, and a scaling factor which varies from 0.5 to 2.0 according to the value of the MB's spatial activity (measured by pixel variance) with respect to the average activity of the MBs in the previously encoded picture. Because MBs are coded in scan-line order, the encoder has no knowledge of the complexity of the yet-to-be coded MBs. Therefore, in order to observe the picture-level bit assignment, each MB is assigned the same number of bits up front, and the aggregate difference between the number of bits asigned and actually used will then determine the virtual buffer fullness (and hence the value of the reference Q).
There are two fundamental problems with this rate control scheme. First, because TM5 assumes almost equal numbers of bits per GOP, when there is a scene change occurring at a non-intra picture (i.e., a P or B picture) within a GOP, the coded quality will drop substantially because scene change normally requires a lot more bits to code than that designated by the a priori picture-level bit assignment for predictive pictures, in order to maintain a resonable quality. Second, because each MB within a picture is assigned equal numbers of bits up front and because MBs are processed in scan-line order, the rate control scheme cannot take advantage of the variation in coding complexity within a picture when assigning the value of Q, possibly resulting in large quality variations within a picture. Moreover, the reference Q values are set solely based on the buffer fullness constraint without regarde to the actual picture content in the MB. This is undesirable since it is well known that there is a strong relationship between the occurrence of quantization noise and the features displayed by pixels in a MB. Although TM5-step3 also utilizes a variance-dependent scaling factor to partially circumvent this shortcoming, its effectiveness is limited.