The present invention relates to digital video signal processing, and more particularly to devices and methods for video coding.
There are multiple applications for digital video communication and storage, and multiple international standards have been and are continuing to be developed. Low bit rate communications, such as, video telephony and conferencing, led to the H.261 standard with bit rates as multiples of 64 kbps, and the MPEG-1 standard provides picture quality comparable to that of VHS videotape. Further developments led to better compression performance in video coding standards such as MPEG-2, MPEG-4, H.263, and H.264/AVC. At the core of all of these standards is the hybrid video coding technique of block motion compensation plus transform coding. Block motion compensation is used to remove temporal redundancy between successive pictures (frames or top/bottom fields), whereas transform coding is used to remove spatial redundancy within each picture. FIGS. 2a-2b illustrate H.264/AVC functions which include a deblocking filter within the motion compensation loop to limit artifacts created at block edges.
Traditional block motion compensation schemes basically assume that between successive frames an object in a scene undergoes a displacement in the x- and y-directions and these displacements define the components of a motion vector. Thus an object in one frame can be predicted from the object in a prior frame by using the object's motion vector. Block motion compensation simply partitions a frame into blocks and treats each block as an object and then finds (motion estimation) a motion vector which locates the most-similar block in the prior frame. The most-similar block is a prediction of the current block and this prediction block can be encoded simply by the motion vector. This simple block motion assumption works out in a satisfactory fashion in most cases in practice and can be easily extended for use with interlaced fields. Consequently, block motion compensation has become the most widely used technique for temporal redundancy removal in video coding standards.
Typically, a frame is partitioned into macroblocks where each macroblock contains four 8×8 luminance (Y) blocks plus two 8×8 chrominance (Cb and Cr or U and V) blocks, although other block sizes, such as 4×4, are also used in H.264/AVC. The frame can be encoded either with motion compensation or without motion compensation. An I-frame is encoded without motion compensation (“intra-coded”) by simply applying the transform, quantization, and variable-length coding to each macroblock (or prediction error block using adjacent-pixel prediction). In contrast, a P-frame is encoded (“inter-coded”) with motion compensation and a macroblock is encoded by its motion vector plus the transform, quantization, and variable-length coding of its residual block (prediction error block from the motion vector located block). The transform of a block converts the pixel values of a block from the spatial domain into a frequency domain; this takes advantage of decorrelation and energy compaction of transforms such as the two-dimensional discrete cosine transform (DCT) to make the quantization more effective. For example, in MPEG-2 and H.263, 8×8 blocks of DCT-coefficients are quantized, scanned into a one-dimensional sequence, and the sequence coded by using variable length coding. H.264/AVC uses an integer approximation to a 4×4 DCT.
The rate-control unit in FIG. 2a is responsible for generating the quantization parameter by adapting to a target transmission bit-rate in view of the current fullness of the output buffer; a larger quantization parameter implies more vanishing and/or smaller quantized transform coefficients which means fewer and/or shorter variable-length codewords and consequent lower bit rates and smaller files. Of course, a larger quantization parameter also means more distortion in the frame decoded after transmission. Typically, the quantization step sizes vary linearly or exponentially with the quantization parameter (QP); e.g., quantization step size could be proportional to QP or to 2QP/6. Also, the predictive power of motion compensation generally leads to many more bits used encoding an I-frame than encoding a P-frame with comparable distortion.
Further, some low-rate applications may have an input at the typical 30 frames per second but only output 10 or 15 encoded frames per second. Thus input frames are skipped to adapt to the output encoded frame rate.
Telenor (Norwegian telecom) made an encoding implementation for H.263, denoted Test Model Near-term 5 or TMN5, publicly available; and this implementation has been widely adopted including use for MPEG-4. The TMN5 rate control includes the function updateQuantizer( ) which generates a new quantization parameter (qp or QP) value based on the bits used up to the current macroblock in a frame and the bits used for encoding the prior frame. The function should be called at the beginning of each row of macroblocks (i.e., each slice), but it can be called for any macroblock.
However, the TMN5 encoder has problems including being designed for low-delay, low-bit-rate applications and using a fixed quantization parameter for I-frames and only varying the quantization parameter for P-frames.
For rate control methods using a fixed quantization parameter for 1-frames (QPI), when the frame generates too many bits, the encoder may be forced to abort the frame to avoid either overflowing the video output buffer (vbv) or exceeding limits on instantaneous bit rate. Also, several frames may need to be skipped afterwards, to reduce delay for low-delay applications. This is particularly a problem for the first I-frame of a sequence when the initialization value for QPI is set too low.
Another problem is that some encoder systems produce a variable frame rate. For instance, an encoder may skip a frame, due to insufficient MIPS, and some sensors change the integration time (exposure) based on the light available. These sensors may have a target frame rate to feed the encoder, but the actual capture time may be delayed, particularly in poor lighting conditions, or the capture may occur early. Rate control methods generally assume a fixed input frame rate, but with a variable frame rate, the buffer level may be different than what is assumed for fixed frame rate. Without compensating for these variations, the encoder may overflow the vbv buffer, or abort a frame unnecessarily, due to incorrect modeling of the vbv buffer.