The present invention relates to digital video signal processing, and more particularly to devices and methods for video coding.
There are multiple applications for digital video communication and storage, and multiple international standards for video coding have been and are continuing to be developed. Low bit rate communications, such as, video telephony and conferencing, led to the H.261 standard with bit rates as multiples of 64 kbps, and the MPEG-1 standard provides picture quality comparable to that of VHS videotape. Subsequently, H.263, MPEG-2, and MPEG-4 standards have been promulgated.
H.264/AVC is a recent video coding standard that makes use of several advanced video coding tools to provide better compression performance than existing video coding standards. At the core of all of these standards is the hybrid video coding technique of block motion compensation (prediction) plus transform coding of prediction error. Block motion compensation is used to remove temporal redundancy between successive pictures (frames or fields) by prediction from prior pictures, whereas transform coding is used to remove spatial redundancy within each block of both temporal and spatial prediction errors. FIGS. 2a-2b illustrate H.264/AVC functions which include a deblocking filter within the motion compensation loop to limit artifacts created at block edges.
Traditional block motion compensation schemes basically assume that between successive pictures an object in a scene undergoes a displacement in the x- and y-directions and these displacements define the components of a motion vector. Thus an object in one picture can be predicted from the object in a prior picture by using the object's motion vector. Block motion compensation simply partitions a picture into blocks and treats each block as an object and then finds its motion vector which locates the most-similar block in a prior picture (motion estimation). This simple assumption works out in a satisfactory fashion in most cases in practice, and thus block motion compensation has become the most widely used technique for temporal redundancy removal in video coding standards. Further, periodically pictures coded without motion compensation are inserted to avoid error propagation; blocks encoded without motion compensation are called intra-coded, and blocks encoded with motion compensation are called inter-coded.
Block motion compensation methods typically decompose a picture into macroblocks where each macroblock contains four 8×8 luminance (Y) blocks plus two 8×8 chrominance (Cb and Cr or U and V) blocks, although other block sizes, such as 4×4, are also used in H.264/AVC. The residual (prediction error) block can then be encoded (i.e., block transformation, transform coefficient quantization, entropy encoding). The transform of a block converts the pixel values of a block from the spatial domain into a frequency domain for quantization; this takes advantage of decorrelation and energy compaction of transforms such as the two-dimensional discrete cosine transform (DCT) or an integer transform approximating a DCT. For example, in MPEG and H.263, 8×8 blocks of DCT-coefficients are quantized, scanned into a one-dimensional sequence, and coded by using variable length coding (VLC). H.264/AVC uses an integer approximation to a 4×4 DCT for each of sixteen 4×4 Y blocks and eight 4×4 chrominance blocks per macroblock. Thus an inter-coded block is encoded as motion vector(s) plus quantized transformed residual block.
Similarly, intra-coded pictures may still have spatial prediction for blocks by extrapolation from already encoded portions of the picture. Typically, pictures are encoded in raster scan order of blocks, so pixels of blocks above and to the left of a current block can be used for prediction. Again, transformation of the prediction errors for a block can remove spatial correlations and enhance coding efficiency.
The rate-control unit in FIG. 2a is responsible for generating the quantization step (qp) by adapting to a target transmission bit-rate and the output buffer-fullness. Indeed, video streams are generally provided with a designated bit-rate for the compressed bit-stream. The bit-rate varies depending on the desired image quality, the capacity of storage/communication channel, etc. In order to generate compressed video streams of the specified bit-rate, a rate controller is implemented in practical video encoding systems. In the recent video coding standards, the bit-rate can be controlled through the quantization step size, which is used to quantize sample coefficients so that it may determine how much of spatial detail is retained. When the quantization step size is very small, the bit-rate is high and almost all of the picture detail is saved. As the quantization step size is increased, the bit-rate decreases at the cost of some loss of quality. The goal of the rate control is to achieve the target bit-rate by adjusting the quantization step size while minimizing the total loss of quality. A rate control algorithm may greatly affect the overall image quality even at a given bit-rate.
MPEG-2 Test Model 5 (TM5) rate control has achieved widespread familiarity as a constant bit rate (CBR), one-pass rate control algorithm. The one-pass rate control algorithms are suitable for real time encoding systems because the encoding process is performed only once for each picture. However, the quantization step size shall be determined prior to the encoding process. TM5 rate control algorithm determines the quantization step size in the following three steps: (1) bit allocation, (2) rate control, and (3) adaptive quantization. In short, step 1 assigns a budget of bits to the current picture based on the statistics obtained from previously encoded pictures. Then, to achieve the assigned budget, step 2 adjusts the quantization step size during the encoding process using a feedback loop. While the steps 1 and 2 are included to achieve higher compression efficiency, step 3 is included to improve subjective image quality.
However, the known rate control methods have problems with scene changes in the video sequence, and the quantization step may vary leading to unpleasant visual effects.