This invention pertains generally to digital video encoders, and more particularly to frame-adaptive bit rate control of such encoders.
Uncompressed video requires a relatively high transmission bandwidth. Almost all human-viewable video sequences, however, contain a large amount of redundant and/or visually unimportant information. Digital video allows the use of complex algorithms that remove redundant and relatively unimportant information from a digital video bitstream. With this information removed, video transmission bandwidth may be reduced to acceptable levels. A system that implements video compression algorithms is known as a digital video encoder.
FIG. 1 shows a digital video encoder 36 that employs motion compensation to reduce bandwidth. An image sequence 20, consisting of M frames F1, F2, . . . Fj, . . . FM, provides the input to encoder 36. Motion compensator 24 is often crucial for effective bandwidth reduction. Motion compensator 24 produces a prediction frame {tilde over (F)}j for each input frame Fj by comparing regions of Fj to regions from previously encoded frames for similarity. To be more concrete, given a region in frame Fj, the encoder searches for regions in previously encoded frames that are good matches for such region and combines these good matches to form the region""s prediction. Thus {tilde over (F)}j represents an approximation of Fj based on combinations and/or translations of regions of previously encoded frames. This approximated image can be encoded by merely sending the combination and/or translation instructions, or motion vectors MVj, for frame j.
Before an image Fj is encoded, approximation {tilde over (F)}j is subtracted from it in image summer 22. The remaining motion-compensated frame, i.e., the difference image Fjxe2x88x92{tilde over (F)}j, represents the portion of the input image Fj frame that cannot be easily predicted from previous frames.
The motion-compensated frame is compressed to a target size of approximately T bits by image encoder 26. The motion vectors are likewise compressed in motion encoder 28. When the image sequence is to be transmitted directly to a viewer through channel or interface 34 without intermediate storage (e.g., storage to a disk or other storage media), a buffer 30 is often used to allow the xe2x80x9cburstyxe2x80x9d encoder outputs to be smoothly transmitted as bitstream 32.
At the receiving end of interface 34, bitstream 32 is received by a video decoder 38. Video decoder 38 often places bitstream 32 in a decoder buffer 47 and then parses bitstream 32 to a motion decoder 42 and an image decoder 40. Image summer 44 combines the output of decoders 40 and 42 to produce an output image sequence 46. Depending on whether image encoder 26 is lossless or lossy, output image sequence 46 may or may not be an exact representation of image sequence 20.
In a digital video encoder 36, and more specifically its embedded image coder 26, it is often preferable to vary the target size T from image frame to image frame. A process that varies T is known as a xe2x80x9cframe-layer rate control.xe2x80x9d A large body of work on frame-layer rate control has been reported in the patent and academic literature [1-14, Appendix A]. Typically, all of these methods decide the target number of bits for a frame using some formula that depends on the energy in that frame, the number of bits used for encoding previous frames, and the current fullness of the encoder buffer 30 (or that of the decoder buffer 47, which is equivalent since both buffers are related).
For example, the frame-layer rate control described in [1] uses a formula for the frame target that depends on the energy of the pixels in the frame and the number of bits used in a previous frame with similar energy. In [13], the energy and previous bits are combined in a different formula to select the target and in [14] such formula depends on the product of quantization values and bits used for a previous frame of the same type. In [3], the frame target depends on a formula that increases in inverse proportion to the fullness of the encoder buffer. Finally, the method in [12] assigns a fixed target number of bits per frame (equal to the channel rate divided by the frame rate) and skips frames when the encoder buffer is close to buffer overflow.
Even though all frame-layer rate control techniques in the prior art measure similar parameters to determine the target number of bits for a frame [1-14], the formula or method chosen for combining these measurements is the key for an effective bit allocation. Typically, these formulas are ad hoc and are not optimized in a rate-distortion sense and, as a result, they do not minimize image distortion (i.e., maximize image quality) for the available bit rate. Additionally, the desired communication delay, which increases with the size of the encoder buffer, is not taken directly into account when deciding the frame target. At low delay, the latter produces large fluctuations of the fullness level in the encoder buffer that lead to undesired buffer overflow, underflow, and frame skipping.
The present invention provides a frame-layer rate control mechanism that is based on a rate-distortion optimization. And the present invention further teaches modifications to this basic mechanism that allow a digital video encoder rate controller to respond to differences in communication delay. In addition to the benefits of the rate-distortion optimal solution provided by the basic mechanism, these modifications add robustness to a rate controller, such that one controller can be used in a range of delay situations (or even a varying delay situation).
A digital video encoder is disclosed herein. This encoder comprises a frame-layer rate controller that bases a target bit assignment Tj on an energy estimate for frame j, an average energy estimate for a group of frames, and the desired overall bit rate. If the encoder uses motion compensation, motion bits used for frame j and an average motion bits estimate for a group of frames may also be used by the frame-layer rate controller. Preferably, the desired delay in the system affects how average estimates are computed by the encoder.
The rate controller of the video encoder above may also use buffer protection logic during target bit assignment. This logic corrects an initial target bit assignment Tj, based on the transmission bit rate, desired delay, and current buffer fullness.
In a further aspect of the invention, methods for combining the parameters and estimates described above are also disclosed. For example, an initial target bit assignment can be computed by multiplying the desired overall average bit assignment by the ratio of the frame energy estimate to the average energy estimate. Preferably, the average energy estimate is calculated by filtering frame energy estimates with a filter having a time constant dependent on the desired system delay.