The present invention relates to processing of video data, and more particularly to control of bit rates in encoding video data
Controlling the bit rate for encoding of video data is important and affects the quality of the video data. An effective bit rate control technique (alternatively referred to hereinbelow as rate control) allocates more bits for encoding more complex regions of a video frame. For a discrete cosine transform (DCT) based system, the bit rate control typically adjusts the quantization scale of the DCT coefficients to regulate the bit rate. When a lower rate is desired, the quantization scale is increased. Conversely, when a higher rate is desired, the quantization scale is decreased.
A well-known rate control algorithm called Test Model 5 (TM5) has been developed in connection with the MPEG-2 standard (Test Model Editing Committee: “Test Model 5”; ISO/IEC JTC/SC29/WG11/N400, April 1993). The TM5 algorithm maintains a constant bit rate for each group of pictures (GOP). A GOP often includes three types of frames, namely I, P and B frames.
Bit rate control in the TM-5 Standard involves three steps. During the first step, commonly referred to as frame level rate control step, the number of bits available to code the next frame is estimated. During the second step, commonly referred to as macro block level rate control step, the reference value of the quantization parameter for each macro block is set by means of a virtual buffer. During the third step, commonly referred to as the adaptive quantization step, the reference value of the quantization parameter set during the second step modified in accordance with the spatial activity measurement of the macro block. The modified reference value of the quantization parameter is subsequently used in the quantization process. A more detailed description of each of the above three steps follows.
Frame Level Rate Control
Frame level rate control is adapted to allocate bits among frames according to their complexity. In accordance with the frame level rate control, more bits are used to encode more complex frames while at the same time the average number of bits per second is maintained close to a given bit rate. Assume that bit_rate and picture_rate respectively represent the bit rate in bits/sec and the frame rate in frames/sec for encoding a video sequence. For simplicity, bit allocation is performed for a GOP. Assume fuller that there are N pictures in the GOP, i.e., the period of I frames in the video sequence equals to N. The total number of bits for encoding this GOP is equal to the product of N and the average number of bits used for encoding each frame.
In accordance with the frame level rate control, these total number of bits is distributed among the frames of the GOP such that all the bits are consumed for encoding of the GOP. For simplicity, this distribution is adaptively adjusted after encoding a frame. More specifically, after a frame is encoded, a target number of bits for encoding the next frame of the same frame type is determined by taking into account the complexity of the last encoded frame and using the remaining unused bits.
Complexity Estimation
Assume Xi, Xp, and Xb represent global complexity measurement for I, P and B frame types respectively. After a frame is encoded, the corresponding global complexity measurements are computed as shown below:Xi=Si×avg—QiXp=Sp×avg—Qp  (1.1)Xb=Sb×avg—Qbwhere Si, Sp or Sb respectively represent the number of bits generated for encoding of I, P and B frames that were last encoded. The average quantization parameters avg_Qi, avg_Qp and avg_Qb are the average of the quantization parameter QP for all the macro blocks (rounded to an integer) used during encoding the I, P or B frames respectively—this average also includes the quantization parameter for skipped macro blocks. The initial values of Xi, Xp, and Xb are set as follows.Xi=(160×bit_rate)/115+0.5Xp=(60×bit_rate)/115+0.5  (1.2)Xb=(42×bit_rate)/115+0.5Target Number of Bits
In the TM5 standard, the target number of bits for the next I, P or B frame of a GOP, respectively represented as Titm5, Tptm5, or Tbtm are computed as shown below:Titm5=max {R/(1+Np×Xp/(Kp×Xi)+Nb×Xb/(Kb×Xi))+0 5, Tmin}Tptm5=max {R/(Np+Nb×Kp×Xb/(Kb×Xp))+0 5, Tmin}  (1.3)Tbtm5=max {R/(Nb+Np×Kb×Xp/(Kp×Xh))+0 5, Tmin}
Variables Np and Nb are respectively the number of P and B frames of a current GOP that are yet to be encoded (hereinafter alternatively referred to as the remaining frames). The minimum target number of bits Tmin is defined as:Tmin=bit_rate/(8×picture_rate)+0 5  (1.4)where Kp and Kb are universal constants and depend on quantization matrices. For the matrices specified in MPEG-4, Kp=1.0 and Kb=1.4. R is the remaining number of bits assigned to the GOP and is updated after a frame is encoded in accordance with the following expression:R=R−Si,p,h  (1.5)
In the above expression (1.5), Si,p,b is the number of bits used to encode the last frame (hereinafter referred to alternatively as the just encoded frame). Furthermore, the R on the left side is the number of bits remaining immediately after the last frame is encoded, and the R on the right is the number of bits remaining immediately before the last frame is encoded. R is set to 0 at the beginning of a new video sequence. However, prior to encoding the first frame, i.e., the I-frame of a new GOP, R is set to the following value:R=R+bit_rate×N/picture_rate+0.5  (1.6)Macro Block Level Rate Control
During the macro block level rate control step, the reference value of the quantization parameter for each macro block is determined by means of a virtual buffer-. The TM5 uses virtual buffer parameters di, dp, and db for the corresponding frame types to regulate bit rates. Upon allocation of the target bit rate ti, tp, and tb, for each frame, the virtual buffer capacity is updated for each macro block. Next, a quantization parameter for the macro block is computed. In the TM5, the quantization parameter can be modified according to an activity measure of the macro block to achieve better picture quality. Consequently, the TM5 standard uses the quantization parameter directly to control the bit rate. This quantization parameter is derived from the bit rate allocated to each corresponding frame. Thus, changing the target frame rate will impact the quantization parameter and consequently manifest itself in a picture quality variation.
Virtual Buffers
For each I, P or B frame, the rate control in the macro block level is achieved by changing a quantization parameter and according to the fullness of a corresponding virtual buffer. Assume Qj is the reference quantization parameter of the j-th macro block, and dj is a measure of the fullness of the virtual buffer. The reference quantization parameter is computed as shown below:Qj=dj×(31/r)  (2.1)Note that Qj is further modified according to the macro block activity—described further below—before it is applied to quantization. In the above equation (2.1) r is a reaction parameter defined by:r=2×bit_rate/picture_rate  (2.2)
Assume dij, dpj and dbj respectively represent the buffer fullness for each of frame types I, P and B respectively. Assume further that MB_cnt represents the number of macro blocks in a frame. Consequently, before encoding the j-th macro block (j is greater than or equal to zero), the corresponding buffer fullness parameters are calculated as shown below:dij=di0+Bj−1−j×TMidpj=dp0+Bj−1−j×TMp  (2.3)dbj=db0+Bj−1−j×TMbIn equations (2.3), Bj is the number of bits used in encoding all the macro blocks in the frame up to the j-th macro block, and B−1 is set to 0. If the values of dij, dpj, or dbj in the equation (2.3) are negative, they are set to zero.
In the TM5, the target number of bits per macro block TMi, TMp, and TMb is defined as:TMi=Ti/MB—cntTMp=Tp/MB—cnt  (2.4)TMb=Tb/MB—cntFurthermore, in equations (2.3), di0, dpj and db0 represent the initial buffer fullness for each of I, P and B frame types respectively. The default values of di0, dp0, and db0 are:di0=15×r/31dp0=Kp×di0  (2.5)db0=Kb×di0
At the beginning of encoding a video sequence, the initial buffer fullness parameters are set to their corresponding default values in accordance with their frame types. After encoding the last macro block of a frame, the default values are updated as shown below:di,pb0=di,p,b0+Si,p,b−Ti,p,b  (2.6)where Si,p,b is the total number of bits used in encoding of the frame, Ti,p,b is the target number of bits for the corresponding frame type (see equations (2.3)). The updated initial buffer fullness values are used to encode the next frame of the same frame type. If di,p,b0 as determined by expression (2.6) yields a negative value, the default initial values as shown in expressions (2.5) are used.Adaptive Quantization
During the adaptive quantization step, the reference value of the quantization parameter set during the macro level rate control step is modified in accordance with the level of the activity of the macro blocks. Assume actj is the measure of spatial activity for the j-th macro block of a frame. As seen below, actj is calculated from four luminance frame-organized sub-blocks (n=1.4), and four luminance field-organized sub-blocks (n=5.) using the original intra pixel values:actj=1+min{vblk1, vblk2, . . . , vblk8}  (3.1)where the approximated variance is defined as:vblkn=(1/64)×Σ64k=1|Pnk−P_meann|  (3.2)and where:P_meann=(1/64)×Σ64k=1Pnk  (3.3)In equation (3.3), Pnk is the intensity value of the k-th pixel in the n-th 8×8 block.
Assume avg_act is the average value of actj for all macro blocks in the just encoded frame. For the first frame, variable avg_act is typically set to 400. The normalized activity measure is defined as:N—actj=(2×acti+avg—act)/(actj+2×avg—act)  (3.4)The quantization parameter for the jth macro block mquantj is obtained by multiplying the reference quantization parameter Qj with the normalized activity measure N_actj, as shown below:mquantj=Qi×N—actj+0.5  (3.5)The value of mquantj is maintained between 1 through 31, as known in the art.
While the TM5 algorithm works reasonably well in general, it does not offer any special treatment for the case of a scene change, in which the bit rate required for the frame is larger than the allocated bit rate. First, the predictive coding mode might not be suitable for the scene changed frame since the motion-compensated frame differences can consume more bits than the original frame.
U.S. Pat. No. 5,032,905, entitled “Accurate Detection of a Drastic Change between Successive Pictures”, suggests an intra-mode for the scene change frame. Whether this scene change frame is coded in inter-mode or intra-mode, it will consume many more bits than a typical predictive frame. However, the normal TM5 algorithm cannot afford this bit rate budget because the target bit rate allocated to a P-frame is much less than that allocated to an I-frame. With such a the bit rate constraint, the picture quality of this scene change P-frame will suffer badly. Consequently, all remaining frame in the GOP will be subject to poor quality. If the scene change occurs in the early portion of a GOP, the quality degradation will be very noticeable.
U.S. Pat. No. 5,617,150, entitled “Video Bit Rate Control Method,” describes a method that attempts to alleviate the problem by “borrowing” bits from previous frames for a scene change P-frame. The encoding is performed in a delayed fashion. The target bit rate assigned to a current sub-GOP (an I-frame or a P-frame and its preceding B-frames) is delayed until the scene change detection for the following sub-GOP is completed. If a scene change is detected in the following sub-GOP, the target bit rate for the current sub-GOP is reduced so that more bits can be used for the scene changed frame. Otherwise, the target bit rate for the current sub-GOP is not modified. The “borrowing” of bits is limited to the sub-GOP immediately preceding the current sub-GOP, and the amount that can be borrowed is very limited. Also, the amount of quality improvement is also thus limited. Furthermore, this method introduces an additional delay that is proportional to the number of frames in a sub-GOP.
Thus, there is a need for an improved method of encoding a sequence of video frames that is adapted to reduce picture quality degradation when the bit rate allocated for encoding of the sequence of frames falls below a predicted value.