Methods for encoding moving pictures or video had been developed for efficient transmission and storage. A current art of such encoding methods is found in MPEG2 Test Model 5, ISO/IEC JTC1/SC29/WG11/NO400, April 1993, and the disclosure of that document is hereby expressly incorporated herein by reference. In this method, an input video sequence is organized into sequence layer, group-of-pictures, pictures, slices, macroblocks, and finally block layer. Each picture in a group-of-pictures will be coded according to its determined picture coding type.
The picture coding types used in the MPEG2 Test Model 5 include intra-coded picture (I-picture), predictive-coded picture (P-picture), and bi-directionally predictive-coded picture (B-picture). The I-pictures are used mainly for random access or scene update. The P-pictures use forward motion predictive coding with reference to previously coded I- or P-pictures (anchor pictures), and the B-pictures use both forward and backward motion predictive/interpolative coding with reference to previously coded I- or P-pictures. A group of pictures (GOP) is formed in encoded order starting with an I-picture and ending with the picture before the next I-picture in the sequence.
A picture is partitioned into smaller and non-overlapping blocks of pixel data called macroblocks (MB) before encoding. Each MB from a P- or B-picture is subjected to a motion estimation process in which forward motion vectors, and backward motion vectors for the case of a B-picture MB, are determined using reference pictures from a frame buffer. With the determined motion vectors, motion compensation is performed where the intra- or inter-picture prediction mode of the MB is first determined according to the accuracy of the motion vectors found, followed by generating the necessary predicted MB containing the prediction error.
The predicted MB is then subjected to discrete cosine transform (DCT) and quantization of the DCT coefficients based on quantization matrices and quantization step-size. The quantized DCT coefficients of the MB is then run-length encoded with variable length codes (VLC) and multiplexed with additional information such as selected motion vectors, MB coding modes, quantization step-size, and/or picture and sequence information, to form the output bitstream.
Local decoding is performed by inverse quantizing the quantized DCT coefficients, followed by inverse DCT, and motion compensation. Local decoding is performed such that the reference pictures used in the motion compensation are identical to those used by any external decoder.
The quantization step-size (QS) used for quantizing the DCT coefficients of each MB has a direct impact on the number of bits produced at the output of the run-length VLC encoding process, and therefore the average output bit rate. It has also a direct impact on the encoding quality, which represents the output picture quality at the corresponding decoder. In general, larger QS generates lower output bit rate and lower encoding quality. In order to control output bit rate and picture quality so that the resulting bitstream can satisfy channel bandwidth or storage limitation as well as quality requirements, rate control and quantization control algorithms are used.
Some methods for rate control and quantization control can be found in the above mentioned MPEG-2 Test Model 5. These methods comprise generally a bit allocation process, a rate control process, and an adaptive quantization process. In the bit allocation process, a target number of bits is assigned for a new picture to be coded according to a number of previously determined and pre-set parameters. The rate control step then calculates a reference quantization step-size for each MB based on the target bits for the picture and the number of bits already used from the target bits in encoding MBs from that picture. In the adaptive quantization process, the calculated reference quantization step-size is scaled according to local activities of the MB, and an average MB activity determined from the current or a previously coded picture. This scaling is done according to a level of masking effects of coding noise by human perception for MB with high or low activities within a picture. A video buffer verifier (VBV) may also be employed in such a way that underflow and overflow of the decoder input buffer are prevented as required by the MPEG standard to ensure a target bit rate is maintained.
It is assumed in the bit allocation process that the visual quality of a coded picture can be qualified with a single number VQ, expressed by the formula:                               V          Q                =                  K          Q                                    (        1        )            where Q is the average quantization step-size of the coded picture and K is a constant quality factor which depends only on the picture coding type. It is also assumed that the visual qualities of all encoded pictures should be maintained at a similar level within a GOP. Therefore, for all pictures within a GOP, the bit allocation process maintains the following equality:                                           K            I                                Q            I                          ≈                              K            P                                Q            P                          ≈                              K            B                                Q            B                                              (        2        )            where QI, QP, QB are the respective average quantization step-sizes of coded I-, P-, and B-picture, and similarly KI, KP, KB are respective pre-determined quality factors for I-, P-, and B-pictures. Although this equality does not apply to an entire pictures sequence, it should be considered valid within a GOP as well as across consecutive GOP boundaries. For simplicity, KI, of equation (2) is normalized to the value of 1.
From the above assumptions, an equation for determining target bit allocation for a picture to be coded can be derived for each of the picture coding types. The equations are given as follows for each of the I-, P- and B-picture coding type:                               T          1                =                  R                      1            +                                          N                P                            ⁡                              (                                                                            X                      P                                        ⁢                                          K                      I                                                                                                  X                      I                                        ⁢                                          K                      P                                                                      )                                      +                                          N                B                            ⁡                              (                                                                            X                      B                                        ⁢                                          K                      I                                                                                                  X                      I                                        ⁢                                          K                      B                                                                      )                                                                        (        3        )                                          T          P                =                  R                                    N              P                        +                                          N                B                            ⁡                              (                                                                            X                      B                                        ⁢                                          K                      P                                                                                                  X                      P                                        ⁢                                          K                      B                                                                      )                                                                        (        4        )                                          T          B                =                  R                                    N              B                        +                                          N                P                            ⁡                              (                                                                            X                      P                                        ⁢                                          K                      B                                                                                                  X                      B                                        ⁢                                          X                      P                                                                      )                                                                        (        5        )            whereXI=SIQI, XP=SPQP, XB=SBQB, and                SI, SP, SB are number of bits used by previously encoded I-, P-, B-picture respectively,        QI, QP, QB are the average quantization step-size used by previously encoded I-, P-, B-picture respectively,        NP and NB are the number of P- and B-pictures remaining in the current GOP with respect to the current picture to be coded,        R is the remaining number of bits assigned to the GOP according to a target bit-rate, and        TI, TP, TB are the calculated target bit allocation for a new I-, P-, B-picture to be coded respectively.        
An optional lower limit may be applied to the determined target bit allocation as given in the MPEG2 Test Model 5, hence:                               (                                    T              I                        ⁢                                                   ⁢            or            ⁢                                                   ⁢                          T              P                        ⁢                                                   ⁢            or            ⁢                                                   ⁢                          T              B                                )                ≥                  Bit_Rate                                    K              1                        ×            Picture_Rate                                              (        6        )            where Bit_Rate is the target bitrate,                Picture_Rate is number of pictures coded per second, and        KI is a constrant (eg. 8).        
A typical video encoder system is designed to code picture sequences with various characteristics and complexities. In particular, sequences with little motion updates or complex motion scenes creates different requirements for coding pictures of difference picture coding types.
For example, a sequence with little motion updates may be best coded with higher ratio of bits allocated to the anchor pictures (I-pictures and P-pictures) for visual quality improvements. On the other hand, a sequence with complex motion scenes may be best coded with relatively even distribution of bits to pictures of all picture coding types for motion detail improvements, and hence higher ratio of bit allocation to the B-pictures. Present systems based on fixed visual quality ratios, for example according to equation (2), do not adequately address these changes in sequence characteristics.
In other words, the pre-determined and fixed quality factors KI, KP, and KB have more or less determined the ratios of the average quantization step-sizes (ie. QI, QP, and QB ratios) to be used for coding pictures of different picture coding types. This relationship limits the adaptivity of bit allocation to different picture coding types of different motion characteristics.
Furthermore, the adaptivity of bit allocation should depend on the visual quality itself. When higher visual quality is achieved for the anchor pictures, a feature which re-distributes the bits to B-pictures is desired such that both visual quality and motion details can be balanced.