Due to the huge size of raw digital video data (or image sequences), compression must be applied to such data so that they may be transmitted and stored. There have been many important video compression standards, including the ISO/IEC MPEG-1, MPEG-2, MPEG-4 standards and the ITU-T H.261, H.263, H.264 standards. The ISO/IEC MPEG-1/2/4 standards are used extensively by the entertainment industry to distribute movies, digital video broadcast including video compact disk or VCD (MPEG-1), digital video disk or digital versatile disk or DVD (MPEG-2), recordable DVD (MPEG-2), digital video broadcast or DVB (MPEG-2), video-on-demand or VOD (MPEG-2), high definition television or HDTV in the US (MPEG-2), etc. The later MPEG-4 was more advanced than MPEG-2 and can achieve high quality video at a lower bit rate, making it very suitable for video streaming over the internet, digital wireless network (e.g. 3G network), multimedia messaging service (MMS standard from 3GPP), etc. MPEG-4 is accepted into the next generation high definition DVD (HD-DVD) standard and the MMS standard. The ITU-T H.261/3/4 standards are designed for low-delay video phone and video conferencing systems. The early H.261 was designed to operate at bit rates of p*64 kbit/s, with p=1, 2, . . . , 31. The later H.263 is very successful and is widely used in modern day video conferencing systems, and in video streaming in broadband and in wireless networks, including the multimedia messaging service (MMS) in 2.5G and 3G networks and beyond. The latest H.264 (also called MPEG-4 Version 10, or MPEG-4 AVC) is currently the state-of-the-art video compression standard. It is so powerful that MPEG decided to jointly develop with ITU-T in the framework of the Joint Video Team (JVT). The new standard is called H.264 in ITU-T and is called MPEG-4 Advance Video Coding (MPEG-4 AVC), or MPEG-4 Version 10. H.264 is used in the HD-DVD standard, Direct Video Broadcast (DVB) standard and probably the MMS standard. Based on H.264, a related standard called the Audio Visual Standard (AVS) is currently under development in China. AVS 1.0 is designed for high definition television (HDTV). AVS-M is designed for mobile applications. Other related standards may be under development. H.264 has superior objective and subjective video quality over MPEG-1/2/4 and H.261/3. The basic encoding algorithm of H.264 is similar to H.263 or MPEG-4, except that integer 4×4 discrete cosine transform (DCT) is used instead of the traditional 8×8 DCT and there are additional features including intra-prediction mode for I-frames, multiple block sizes and multiple reference frames for motion estimation/compensation, quarter pixel accuracy for motion estimation, in-loop deblocking filter, context adaptive binary arithmetic coding, etc. See Test Model 5, ISO-IEC/JTC1/SC29/WG11, April 1993, Document AVC 491b, Document 2, which is herein incorporated by reference in its entirety.
These coding algorithms are a hybrid of inter-picture prediction that utilize temporal redundancy and transform coding of the remaining signal to reduce spatial redundancy. Then, the transformed signal is coded using entropy coding methods. Because of the nature of these coding algorithms, the resulting video data has a variable bit-rate (VBR). If the encoding parameters are kept constant during the encoding process, the number of bits in each encoded frame is likely to be very different. This causes big problems in transmission, since most practical networks cannot cope with a large variation in bit-rate.
Typically, rate control of video encoding or transcoding can be described as a constrained optimization problem. The goal is to find the optimal quantization parameters that minimize distortion subject to the target bit budget:
      Q    1    *    ,            Q      2      *        ⁢    …    ⁢          ,      Q    N    *    ,            λ      *        =                  arg        ⁢                              min                                          Q                1                            ,                                                Q                  2                                ⁢                                                                  ⁢                …                ⁢                                                                  ⁢                                  Q                  N                                            ,              λ                                ⁢                                    ∑                              i                =                1                            N                        ⁢                          D              i                                          +              λ        ⁡                  (                                                    ∑                                  i                  =                  1                                N                            ⁢                              B                i                                      -            B                    )                    where Q1, Q2, . . . , QN and Q1*, Q2*, . . . , QN* is a set of quantization parameters (QPs) and their optimal values, λ and λ* is the Lagrange multiplier and its optimal value, Di and Bi is the distortion and rate of ith macroblock and B is the target bit budget. In order to determine the optimal quantization parameters and achieve the rate accurately, many R-Q and D-Q models have been proposed. In case of encoding, TM5, TMN-5, TMN-8 and JM are proposed. See Test Model 5 referenced above, and J. Ribas-Corbera and S. Lei, “Rate control in DCT video coding for low delay communications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, no. 1, pp. 172-185, February 1999, which is herein incorporated by reference in its entirety. TMN-8 outperforms the other schemes in terms of PSNR, and, at the same time, maintains a low processing delay. On the other hand, in the case of transcoding, since additional information from the encoded bitstream is available, simplified rate control schemes have been proposed by re-using this information in different ways, such as the complexity measurement of macroblock and quantization parameter determination, to reduce the complexity. For example, see Z. Lei and N. D. Georganas, “Accurate bit allocation and rate control for DCT domain video transcoding,” in IEEE Canadian Conference on Electrical and Computer Engineering, vol. 2, May 2002, pp. 968-973, and K.-D. Seo, S.-H. Lee, J.-K. Kim and J.-S. Koh, “Rate control algorithm for fast bit-rate conversion transcoding.” IEEE Transactions on Consumer Electronics, vol. 46, no. 4, pp. 1128-1136, November 2000, which are both herein incorporated by reference in their entirety.
However, both of these algorithms did not consider the characteristics of the macroblocks after quantization or re-quantization in the phase of bit allocation and QP determination. If all quantized coefficients in the macroblock, including both luminance and chrominance blocks are zero, in general, the allocated number of bits for this macroblock is more than the actual number of bits needed to code it, which can affect the bit allocation for the other macroblocks in the frame.
The proposed TMN-8 rate control algorithm seeks to minimize the mean square error (MSE) distortion subject to the rate constraints by Lagrange optimization techniques. See J. Ribas-Corbera et al. reference above. It can achieve the target bit-rate accurately, a high quality and keeping a low buffer delay. Because of its excellent performance, it was adopted in a test model of H.263+. See ITU-T/SG15, Video codec test model, TMN-8m Portland, June 1997, which is hereby incorporated by reference in its entirety. TMN-8 consists of two parts: frame layer bit allocation; and macroblock layer rate control. At the frame layer bit allocation, the number of bits allocated to the current frame is determined based on the bit-rate and current buffer fullness. If the buffer level exceeds a certain level, several frames will be skipped to maintain a steady buffer occupancy. At the macroblock layer rate control, the algorithm calculates the complexity of the current frame and each macroblock in terms of standard deviation. Then, the optimal quantization step size for the ith macroblock is obtained by the following equation:
                                          Q            i            *                    =                                                                      256                  ⁢                  K                                                  (                                      B                    -                                          256                      ⁢                      NC                                                        )                                            ⁢                                                σ                  i                                                  α                  i                                            ⁢                              S                i                                                    ,                                  ⁢                  i          =          1                ,        2        ,        …        ⁢                                  ,        N                            (        1        )            where K is the model parameter, which updates after encoding of each macroblock, C is the average bits used to encode the overhead information, such as header, motion information, etc, B is the remaining bits for the current frame, σi is the standard deviation of ith macroblock, αi is a weighting for the ith macroblock, which is used as a parameter for controlling the quantization overhead at low bit-rate, and
      S    i    =                    ∑                  k          =          i                    N        ⁢                  α        k            ⁢              σ        k            can be viewed as a complexity measurement of the remaining macroblocks in a frame. The model parameters K and C will be updated after encoding each macroblock by using weighted sum.
In the rate control of hybrid video coding, the rate control will estimate the number of bits needed for each macroblocks based on its complexity and rate constraints and then determine the quantization parameter for each macroblock. The model parameters will be updated after encoding each macroblock to adapt to the statistics of video content. However, under a low bit-rate situation, all transformed and quantized residue coefficients usually tend to be very small or even zero. As a result, for these macroblocks, the estimated number of bits needed for them tends to be larger than the actual number of bits needed. This causes an error in the rate control algorithm and the feedback mechanism will try to correct this error and adjust the model parameter accordingly. This causes an undesirable effect when the rate control algorithm performs the bit allocation to the macroblock with substantial energy left after quantization.