Rate control is necessary in a Joint Video Team (JVT) video encoder to achieve particular constant bitrates, when needed for fixed channel bandwidth applications with limited buffer sizes. Avoiding buffer overflow and underflow is more challenging on video content that includes sections with different complexity characteristics, for example, sections with scene changes and dissolves.
Rate control has been studied for previous video compression standards. TMN8 was proposed for H.263+. The TMN8 rate control uses a frame-layer rate control to select the target number of bits for the current frame and a macroblock-layer rate control to select the value of the quantization parameter (QP) for the macroblocks.
In the frame-layer rate control, the target number of bits for the current frame is determined by
                              B          =                                    R              /              F                        -            Δ                          ,                            (        1        )                                Δ        =                  {                                                                                          W                    /                    F                                    ,                                                                              W                  >                                      Z                    ·                    M                                                                                                                                            W                    -                                          Z                      ·                      M                                                        ,                                                            otherwise                                                                        (        2        )                                W        =                  max          ⁡                      (                                                            W                  prev                                +                                  B                  ′                                -                                  R                  /                  F                                            ,              0                        )                                              (        3        )            where B is the target number of bits for a frame, R is the channel rate in bits per second, F is the frame rate in frames per second, W is the number of bits in the encoder buffer, M is the maximum buffer size, Wprev is the previous number of bits in the buffer, B′ is the actual number of bits used for encoding the previous frame, and Z=0.1 is set by default to achieve the low delay.
The macroblock-layer rate control selects the value of the quantization step size for all the macroblocks in a frame, so that the sum of the macroblock bits is close to the frame target number of bits B. The optimal quantization step size Qi*for macroblock i in a frame can be determined by
                                          Q            i            *                    =                                                                      A                  ⁢                                                                          ⁢                  K                                                                      β                    i                                    -                                      A                    ⁢                                                                                  ⁢                                          N                      i                                        ⁢                    C                                                              ⁢                                                σ                  i                                                  α                  i                                            ⁢                                                ∑                                      k                    =                    1                                    N                                ⁢                                                      α                    k                                    ⁢                                      σ                    k                                                                                      ,                            (        4        )            where K is the model parameter, A is the number of pixels in a macroblock, Ni is the number of macroblocks that remain to be encoded in the frame, σi is the standard deviation of the residue in the ith macroblock, αi is the distortion weight of the ith macroblock, C is the overhead rate, and βi is the number of bits left for encoding the frame by setting β1=B at the initialization stage.
The TMN8 scheme is simple and is known to be able to achieve both high quality and an accurate bit rate, but is not well suited to H.264. Rate-distortion optimization (RDO) (e.g., rate-constrained motion estimation and mode decision) is a widely accepted approach in H.264 for mode decision and motion estimation, where the quantization parameter (QP) (used to decide λ in the Lagrangian optimization) needs to be decided before RDO is performed. But the TMN8 model requires the statistics of the prediction error signal (residue) to estimate the QP, which means that motion estimation and mode decision needs to be performed before the QP is determined, thus resulting in a dilemma of which dependent parameter must be calculated first, each value requiring knowledge about the other uncalculated value on which to base the determination.
To overcome the dilemma mentioned above, a method (hereinafter the “first conventional method”) proposed for H.264 rate control and incorporated into the JVT JM reference software release JM7.4 uses the residue of the collocated macroblock in the most recently coded picture with the same type to predict that of the current macroblock. Moreover, to also overcome the dilemma, another method (hereinafter the “second conventional method”) proposed for H.264 rate control employs a two-step encoding, where the QP of the previous picture (QPprev) is first used to generate the residue, and then the QP of the current macroblock is estimated based on the residue. The former approach (i.e., the first conventional method) is simple, but it lacks precision. The latter approach (i.e., the second conventional method) is more accurate, but it requires multiple encoding, thus adding much complexity.