The present invention relates to a method and apparatus for encoding moving pictures. In particular, the present invention relates to a method and apparatus for performing variable bit rate control for a digital video encoder.
Digital compression has been applied to moving pictures for the purposes of transmissions bandwidth or storage size reduction. One current art of such compression techniques can be derived from the ISO/IEC MPEG Standards, the ISO/IEC 11172-3 (MPEG-1), the ISO/IEC 13818-2 (MPEG-2) and the MPEG-2 TM5 (test model 5), developed by the Moving Picture Experts Group of the International Organisation for Standardization. The disclosures of those standards documents are hereby expressly incorporated into this specification by reference.
In a standard MPEG compliant video encoder, a sequence of moving pictures (e.g. video) is input to the encoder where it is compressed with a user defined target bitrate. The target bitrate is set according to the communication channel bandwidth in which the compressed video is to be transmitted, or the storage media capacity in which the compressed video sequence is to be stored.
Several different forms of coding can be employed depending upon the character of the input pictures, referred to as I-pictures, P-pictures, or B-pictures. The I-pictures are intra-coded pictures used mainly for random access or scene update. The P-pictures use forward motion predictive coding with reference to previously coded I- or P-pictures (anchor pictures), and the B-pictures use both forward and backward motion predictive/interpolative coding with reference to previously coded I- or P-pictures. Furthermore, a group of picture (GOP) is formed in encoded order starting with an I-picture and ending with the picture before the next I-picture in the sequence.
The pictures are partitioned into smaller and non-overlapping blocks of pixel data called Macroblocks (MBs) before encoding. Each MB from a P- or B-picture is subjected to a motion estimation process in which forward motion vectors, and backward motion vectors in the case of a B-picture MB, are determined using reference pictures from a frame buffer. With the determined motion vectors, motion compensation is performed where the intra-or inter-picture prediction mode of the MB is first determined according to the accuracy of the motion vectors found, followed by generating the necessary predicted MB.
The predicted MB is then subjected to discrete cosine transform (DCT) and DCT coefficients quantization based on quantization matrices (QM) and quantization stepsize (QS). The quantized DCT coefficients of the MB is then run-length encoded with variable length codes (VLC) and multiplexed with additional information such as selected motion vectors, MB coding modes, quantization stepsize, and/or picture and sequence information, to form the output bitstream.
Local decoding is performed by inverse quantizing the quantized DCT coefficients, followed by inverse DCT, and motion compensation. Local decoding is performed such that the reference pictures used in the motion compensation are identical to those to be by an external decoder.
The quantization stepsize (QS) used for quantizing the DCT coefficients of each MB has direct impact on the number of bits produced at the output of the VLC encoding process, and therefore the average output bit rate. It has also a direct impact on the encoding quality, which is output picture quality at the corresponding decoder. In general, larger QS generates lower output bit rate and lower encoding quality. In order to control output bit rate and picture quality so that the resulting bitstream can satisfy channel bandwidth or storage limitations as well as quality requirements, rate control and quantization control algorithms are used.
Some methods for rate control and quantization control can be found in the abovementioned MPEG-2 TM5 (Test Model 5). These methods comprise generally a bit allocation process, a rate control process, and an adaptive quantization process. In the bit allocation process, a target number of bits is assigned for a new picture to be coded according to a number of previously determined and present parameters. The rate control step then calculates a reference quantization stepsize QSref for each MB based on the target bits for the picture, the number of bits already used from the target bits in encoding MBs from that picture, and a virtual buffer model as given in MPEG-2 TM5. In the adaptive quantization process, the calculated QSref is then scaled according to local activities of the MB, and an average MB activity determined from the previously coded picture. This scaling is done according to a level of masking effects of coding noise by human perception for MB with high or low activities within a picture. An example of an adaptive quantization technique is disclosed in U.S. Pat. No. 5,650,860, entitled xe2x80x9cAdaptive Quantizationxe2x80x9d. A video buffer verifier (VBV) may also be employed in such a way that underflow and overflow are prevented as required by the MPEG standard to ensure the target bit rate is maintained. Techniques for underflow detection and protection are also disclosed in U.S. Pat. No. 5,650,860.
It is apparent that the fixed target bitrate in the process outlined above has little or no relationship to the actual or varying complexity of the video scenes contained in the input picture sequence. The target bitrate is actually defined by the communication channel bandwidth, or by the target storage capacity for the picture sequence, but the perceptual quality of the resulting pictures when decoded may vary from good to annoying from scene to scene according to scene complexity.
For applications where picture sequences are compressed for storage and retrieval, for example DVD (Digital Video Disc or Digital Versatile Disc), variable bit rate (VBR) may be applied on individual segments of the picture sequences depending on its scene complexity to maximize bit rate allocation and encoded picture quality. Data bits may be reduced for less complex scene to save storage space and increase potential recording duration of the medium, or the resulting storage saving can be used for coding of more complex scenes.
Similarly, VBR can also be applied to other applications such as a multi channel video broadcasting network. Such channel bandwidth may be dynamically allocated to individual video sequences to be multiplexed together so that higher percentage of the bandwidth is used adaptively by sequences with complex scenes.
Existing VBR control algorithms such as that disclosed in U.S. Pat. No. 5,650,860 require multiple encoding passes to properly distribute data bits. In the first coding pass, the bit utilization information is determined for each scene or each picture in the input picture sequence. This may be done by fixing the reference quantization stepsize and disabling the VBV control.
The determined bit utilization information is then be used to generate a bit budget for each scene or picture such that an overall target number of bits to code the sequence is fixed, and so that a maximum bit rate is not violated. To accomplish this, the bit budget for each picture is modified so that the VBV buffer does not underflow. In cases that initial bit utilization information obtained is unrealistic for generating the bit budget, steps from the first coding pass must be repeated with an adjusted reference quantization stepsize. The input sequence is coded in a final pass using the generated bit budget information to achieve the target bits or bit rate. This form of multiple-pass VBR encoder requires very large storage memory for storing the intermediate bit utilization information, and large computational capacity for the additional passes and the bit budget generation. Furthermore, such VBR technique such as this cannot process the input sequence in real-time.
An object of the present invention is to provide a one-pass variable bit rate control technique for coding of moving pictures.
In accordance with the invention, there is provided a method for variable bit rate control in a single pass moving pictures encoder, comprising:
selecting a target picture encoding quality;
selecting upper and lower bit rate limits;
encoding at least one picture based on a target bit rate within the upper and lower bit rate limits;
predicting a current bit rate and an encoding quality based on the result of the encoding step;
comparing the encoding quality of the at least one encoded picture with the target picture encoding quality;
adjusting the target bit rate within the upper and lower bit rate limits according to the result of said comparison and the predicted current bit rate, for encoding subsequent pictures; and
repeating, for each picture in a sequence of pictures, said encoding, predicting, comparing and adjusting steps.
In one form of the invention, the picture encoding quality is based on a mean square error. Alternatively, the picture encoding quality may be based on a signal-to-noise ratio. Preferably the comparing step including measuring a difference in picture encoding quality between corresponding input and locally decoded pictures.
In a particular form of the invention, the pictures to be encoded are arranged in groups of pictures comprising an I-picture and at least one P-picture and/or B-picture, and wherein the target bit allocation is adjusted for each picture or plurality of pictures in each group of pictures. Preferably the target bit allocation is adjusted to achieve a target bit rate, determined on the basis of said comparison, for each picture or plurality of pictures in the group of pictures.
The target picture encoding quality may comprise a target encoding quantization step-size, wherein different target quantization step-sizes are selected for I-, P- and B-pictures.
In one form of the invention, the method may include measuring an average quantization step-size for at least one previously encoded picture, predicting a bit rate for a previously encoded I-, P-, and B-picture, and determining said target bit rate based on said predicted bit rate and a difference between the target encoding quantization step-size and the measured average quantization step-size.
The method of the invention may include measuring an average picture activity for the moving pictures, and modifying the measured difference in picture encoding quality on the basis of the average picture activity.
The present invention also provides a control apparatus for a single pass moving pictures encoder, comprising:
an input for receiving a target picture encoding quality;
an input for receiving upper and lower bit rate limits; and
a controller for controlling the encoder so as to encode at least one picture according to a target bit rate within the upper and lower bit rate limits, predict a current bit rate and an encoding quality based on the result of the encoding, compare the encoding quality of the at least one encoded picture with the target picture encoding quality, and adjust the target bit rate according to the result of said comparison and the predicted current bit rate for encoding subsequent pictures.
Preferably the moving picture encoder includes a frequency transform coefficient quantizer for quantization of the encoded picture data, and wherein the controller comprises a bit rate controller coupled to control the quantization step size of the quantizer, a quantization step size comparator for comparing, as a measure of encoding quality, and actual quantization step size with a target quantization step size based on the target picture encoding quality, a bit allocation processor coupled to control the bit rate controller according to a number of bits remaining for encoding a group of pictures, and a target bit rate estimator coupled to receive the upper and lower bit rate limits and coupled to the bit allocation processor and the bit rate controller for controlling the quantization so that the required bit rate for the quantized picture data is within the upper and lower bit rate limits.
The present invention further provides a single pass variable bit rate video picture encoder comprising:
a picture input for receiving data for a plurality of moving pictures;
a target quality input for receiving a target quality measure for encoded pictures;
an encoder output for supplying encoded picture data;
a bit rate limit input for receiving upper and lower bit rate limits for the encoded picture data;
a bit rate predictor for predicting a current bit rate;
a frequency transform processor for frequency transform encoding picture data from the picture input;
a coefficient quantize for quantizing the encoded picture data according to a quantization step size;
an encoding quality estimator for measuring an encoding quality of quantized encoded pictures; and
a bit rate controller for dynamically controlling the quantization step size of the coefficient quantize based on the predicted current bit rate and a comparison of the target quality and the measured quality so that the encoder output remains within the upper and lower bit rate limits.
The video picture encoder may further include a frame bit counter for a number of remaining bits available for encoding a group of pictures, and a bit allocation processor for controlling the bit rate controller according to the remaining available bits.
The video picture encoder may further include a quality comparator for comparing the target quality with a measured encoding quality, and wherein the bit rate controller controls the quantization step size based on a difference between the target and measured qualities.
In one form of the encoder, the measured encoding quality comprises the quantization step size.
The video picture encoder may further include a local decoder for decoding the quantized encoded picture data, and a quality measurement processor for determining a quality difference between corresponding input and locally decoded pictures.
In another form of the encoder, the quality measurement processor determines a difference in signal-to-noise ratio. Alternatively, the quality measurement processor may determine a mean square error.
It is possible in most applications that only a maximum bit rate and optionally a minimum bit rate are specified. Such applications may include a randomly accessible recording medium or packetized communication network with variable instantaneous bit rate but also a maximum bandwidth specification. In addition, such applications may also require that the input must be compressed in real-time, for example, live broadcasting or live recording cannot make use of multi-pass encoding. Therefore, it is also an object of the present invention to provide an encoder which can be operated in real time in a variable bit rate mode within the maximum and minimum bitrate boundary of the target application.
While encoding an input moving picture sequence, the present invention continuously measures the resulting encoding picture quality, compares it to a defined target quality, and adjusts the encoding bit rate accordingly. By varying the target bit rate of the encoder within a defined maximum bit rate and a minimum bit rate according to a defined target encoded picture quality and the scene complexity, the encoder ensures consistent picture quality when possible and also that the maximum and minimum bit rate of the target application are not violated.