In a preferred embodiment of the invention, the video encoder is an MPEG-2 compliant encoder. The encoder receives a sequence of frames from a video source. The sequence of frames may be progressive or interlaced. Illustratively, the progressive sequence comprises 30 frames per second. In the case of an interlaced sequence, each frame comprises two fields. A top field comprises the even numbered rows and a bottom field comprises the odd numbered rows. Thus, in the case of an interlaced sequence, there are 60 fields per second.
The video source may be any source of a digital video signal such as a video camera or a telecine machine. A telecine machine converts a film comprising 24 frames per second into a 60 field per second digital video signal using 3:2 pull down. The 3:2 pull down technique provides for generating two video fields and three video fields for alternating film frames. For a film frame which is converted into three video fields, the third field is a repeat of the first field.
The video encoder utilizes a compression algorithms to generate an MPEG-2 compliant bit stream from the input sequence of frames. (See ISO/IEC 13818-2)
The MPEG-2 bit stream has six layers of syntax. There are a sequence layer (random access unit, context), Group of Pictures layer (random access unit, video coding), picture layer (primary coding layer), slice layer (resynchronization unit), macroblock (motion compensation unit) and block layer (DCT unit). A group of pictures (GOP) is a set of frames which starts with an I-frame and includes a certain number of P and B frames. The number of frames in a GOP may be fixed or may be variable. Each frame is divided into macroblocks. Illustratively, a macroblock comprises four luminance blocks and two chrominance blocks. Each block is 8.times.8 pixels.
The encoder distinguishes between three kinds of frames (or pictures), I, P, and B. Typically, the coding of I frames results in the most bits. In an I-frame, each macroblock is coded as follows. Each 8.times.8 block of pixels in a macroblock undergoes a DCT (discrete cosine transform) transform to form a 8.times.8 array of transform coefficients. The transform coefficients are then quantized with a variable quantizer matrix. Quantization involves dividing each DCT coefficient F[v][u] by a quantizer step size. The quantizer step size for each AC DCT coefficient is determined by the product of a weighting matrix element W[v][u] and a quantization scale factor (also known as mquant). As is explained below, in some cases the quantization scale factor Q.sub.n for a macroblock n is a product of a rate control quantization scale factor Q.sub.n.sup.R and a masking activity quantization scale factor (QS.sub.n). However, this factorization of the quantization scale factor Q.sub.n is optional. The use of a quantization scale factor permits the quantization step size for each AC DCT coefficient to be modified at the cost of only a few bits. The quantization scale factor is selected for each macroblock.
The resulting quantized DCT coefficients are scanned (e.g., using zig-zag scanning) to form a sequence of DCT coefficients. The DCT coefficients are then organized into run-level pairs. The run-level pairs are then encoded using a variable length code (VLC). In an I-frame, each macroblock is encoded according to this technique.
In a P-frame, a decision is made to code each macroblock as an I macroblock, which macroblock is then encoded according to the technique described above, or to code the macroblock as a P macroblock. For each P macroblock, a prediction of the macroblock in a previous video frame is obtained. The predication is identified by a motion vector which indicates the translation between the macroblock to be coded in the current frame and its prediction in the previous frame. (A variety of block matching algorithms can be used to find the particular macroblock in the previous frame which is the best match with the macroblock to be coded in the current frame. This "best match" macroblock becomes the prediction for the current macroblock.) The predictive error between the predictive macroblock and the current macroblock is then coded using the DCT, quantization, zig-zig scanning, run-level pair encoding, and VLC encoding.
In the coding of a B-frame, a decision has to be made as to the coding of each macroblock. The choices are (a) intracoding (as in an I macroblock), (b) unidirectional forward predictive coding using a previous frame to obtain a motion compensated prediction, (c) unidirectional backward predictive coding using a subsequent frame to obtain a motion compensated prediction, and bidirectional predictive coding, wherein a motion compensated prediction is obtained by interpolating a backward motion compensated prediction and a forward motion compensated prediction. In the cases of forward, backward, and bidirectional motion compensated prediction, the predictive error is encoded using DCT, quantization, zig-zig scanning, run-level pair encoding and VLC encoding.
The P frame may be predicted from an I frame or another P frame. The B frame may also be predicted from an I frame or a P frame. No predictions are made from B frames.
B frames have the smallest number of bits when encoded, then P frames, with I frames having the most bits when encoded. Thus, the greatest degree of compression is achieved for B frames. For each of the I, B, and P frames, the number of bits resulting from the encoding process can be controlled by controlling the quantizer step size (adaptive quantization) used to code each macroblock. A macroblock of pixels or pixel errors which is coded using a large quantizer step size results in fewer bits than if a smaller quantizer step size is used.
After encoding by the video encoder, the bit stream is stored in an encoder output buffer. Then, the encoded bits are transmitted via a channel to a decoder, where the encoded bits are received in a buffer of the decoder, or the encoded bits may be stored in a storage medium.
The order of the frames in the encoded bit stream is the order in which the frames are decoded by the decoder. This may be different from the order in which the frames arrived at the encoder. The reason for this is that the coded bit stream contains B frames. In particular, it is necessary to code the I and P frames used to anchor a B frame before coding the B frame itself.
Consider the following sequence of frames received at the input of a video encoder and the indicated coding type (I, P or B) to be used to code each frame:
1 2 3 4 5 6 7 8 9 10 11 12 13 PA1 I B B P B B P B B I B B P PA1 1 4 2 3 7 5 6 10 8 9 13 11 12 PA1 I P B B P B B I B B P B B PA1 (1) A bit budget BB.sub.i is established for each frame i by allocating the total available coding rate R.sub.eff to each frame i based on the number of bits used to code the previous frame of the same type and the average quantization scale factor used to code the previous frame of the same type relative to the bits used and average quantization scale factor for the previous frames of the other types and the relative frequency of each frame type. PA1 (2) The bit budget for each frame is allocated to the individual sections of the frame coded by the individual master or slave units based on a complexity measure for each section. PA1 (3) The bit budget for each section is then allocated to each macroblock in the section based on a total activity measure for the macroblock. (A description of the total activity measure is provided below). PA1 (4) Virtual buffers v.sub.I, v.sub.P, and V.sub.B, corresponding to frame types I, B and P provide rate control feedback by adjusting the quantization scale factor. A rate control quantization scale factor Q.sub.n.sup.R for a macroblock n in frame i is determined as a function of a ratio of virtual buffer fullness to virtual buffer size. PA1 (5) A masking activity is determined for each macroblock which measures the amount of visual local masking in the macroblock. The rate control quantization scale factor determined from virtual buffer fullness is multiplied by a masking activity quantization scale factor which is dependent on the macroblock masking activity to obtain a total quantization scale factor. PA1 (6) The bit budget BB.sub.i for a current frame i is increased or decreased based on the VBV buffer occupancy level to prevent VBV buffer overflow or underflow. PA1 (7) The rate control may initiate a panic mode. A panic mode arises when a scene is encountered which generates too many bits, even when the quantization scale factor is set to the maximum size. In this case the encoder is in danger of generating too many bits for the channel to transfer to the decoder, thereby causing a "VBV underflow" bit stream error. In this case, the encoder enters the panic mode in which quality is sacrificed to guarantee a legal bit stream. PA1 (8) The rate control algorithm takes into account changes in the effective coding rate R.sub.eff. For a CBR encoder, the rate R.sub.eff may change because a particular encoder may be sharing a channel with a number of other encoders. A statistical multiplexing controller may change the fraction of the channel bandwidth allocated to the particular encoder. For a VBR encoder, the effective encoding rate R.sub.eff will change at various points in the bit stream. The changes in rate are accounted for in VBV buffer enforcement. PA1 (9) The rate control algorithm also accounts for inverse telecine processing by the encoder when allocating bit budgets to particular frames. Inverse telecine processing involves detecting and skipping repeated fields in a field sequence outputted by a telecine machine to the encoder. In particular, the effective frame rate f.sub.eff is given by EQU f.sub.eff =(2/T.sub.i)f PA1 where T.sub.i is the average number of fields in a frame, and f is the nominal frame rate (as specified in a sequence header). PA1 (10) The encoder can detect scene changes. The rate control algorithm is modified as a result of scene changes. In particular, a new GOP is started when a scene change is detected. Default values are used to allocate bits to the first I, P, and B frames in the new scene. The default value for the I frame depends on bit rate and VBV fullness and frame activity. The default values for the P and B frames are determined from the I frame default value. In addition, the initial quantization scale factor used in the first macroblock of the first frame of each type in the new scene is a function of the bit budget for the frame and the total activity of the particular frame. The total activity for a frame is the sum of the total activities for the macroblocks in the frame. The initial rate control quantizer scale factor for a frame of each type (I,P,B) is used to determine the initial occupancies of the corresponding virtual buffers v.sub.I, v.sub.P, v.sub.B. These occupancies are then updated to obtain subsequent rate control quantization scale factors. PA1 (11) The encoder can detect fades (fade to black or fade to white) and account for a fade in the rate control algorithm. PA1 The MPEG-2 compliant encoding technique has several other important features useful for generating an MPEG-2 compliant bit stream. PA1 (1) An inter/intra decision is made for each macroblock in a P or B frame. An intra-bias (IB) used in the decision takes into account the quantization scale factor for the macroblock. PA1 (2) A motion vector is selected for each macroblock to be inter-coded. In the case of an interlaced sequence, it is desirable to pick between a frame-based motion vector and a plurality of field-based motion vectors. A three stage hierarchical procedure is provided to obtain a motion vector for each macroblock to be inter-coded. PA1 (3) For each macroblock in a frame which utilizes frame based encoding, a decision is made whether to use field-or frame-based encoding (DCT, quantization, etc.). The present invention makes the field/frame encoding decision for each macroblock based on comparing (a) the total activity of the frame macroblock and the (b) sum of the total activities of the two field macroblocks. (A macroblock in an interlaced frame may be viewed as comprising macroblocks in each of the fields which comprise the frame. Each such field macroblock contributes half the rows to the frame macroblock.) The smaller of (a) and (b) determines which mode to use. PA1 (1) In a first coding pass, VBV enforcement is disabled. In addition, in the first pass, the rate control quantization scale factor is maintained as fixed. However, the masking activity quantization scale factor is allowed to vary for different macroblocks. PA1 (2) From step (1) a number of bits used to encode each frame in the input sequence in the first encoding pass is determined. Then, a bit budget for each frame in the sequence is determined from the number of bits used to encode each frame in the first pass such that (a) an overall target for the number of bits used to code the entire frame sequence is not exceeded, and (b) R.sub.max, a maximum channel bit rate, is not violated. To accomplish this, the bit budget for each frame is modified so that the VBV buffer does not underflow. It is not necessary to worry about VBV overflow for a VBR encoder. PA1 (3) The input sequence is then coded again in a second pass using the bit budgets determined in step (2). There is no VBV enforcement during the second encoding pass, as any possible VBV underflow has been accounted for as indicated in step (2). Instead, the cumulative coding budget deviation (CE.sub.i) is maintained. This means that there is accumulated over the successive frames that are coded the difference between the bit budget, BB.sub.i, for each frame and BU.sub.i, the actual number of bits used to code the frame. Therefore, for frame i, CE.sub.i =CE.sub.i-1 +BU.sub.i -BB.sub.i. The budget BB.sub.i+m for frame i+1 is modified by an amount proportional to the cumulative budget deviation CE.sub.i. In the second pass, the rate control quantization scale factor is not necessarily fixed and may vary in response to virtual buffer fullness.
For this example there are two B-frames between successive coded P-frames and also two B-frames between successive coded I- and P-frames. Frames "1I" is used to from a prediction for frame "4P, and frames "1I" and "4P" are both used to form predictions for frames "2B" and "3B". Therefore, the order of coded frames in the coded sequence shall be "1I", "4P", "2B", "3B".
Thus, at the encoder output, in the coded bit stream, and at the decoder input, the frames are reordered as follows:
In the case of interlaced video the following applies. Each frame of interlaced video consists of two fields. The MPEG-2 specification allows the frame to be encoded as a frame picture or the two fields to be encoded as two field pictures. Frame encoding or field encoding can be adaptively selected on a frame-by-frame basis. Frame encoding is typically preferred when the video scene contains significant detail with limited motion. Field encoding, in which the second field can be predicted from the first, works better when there is the fast movement.
For field prediction, predictions are made independently for the macroblocks of each field by using data from one or more previous fields (P field) or previous and subsequent fields (B field). For frame prediction, predictions are made for the macroblocks in a frame from a previous frame (P frame) or from a previous and subsequent frame (B frame). Within a field picture, all predictions are field predictions. However, in a frame picture either field prediction or frame prediction may be selected on a macroblock by macroblock basis.
An important aspect of any video encoder is rate control. The purpose of rate control is to maximize the perceived quality of the encoded video when it is decoded at a decoder by intelligently allocating the number of bits used to encode each frame and each macroblock within a frame. Note the encoder may be a constant bit rate (CBR) encoder or a variable bit rate (VBR) encoder. In the case of constant bit rate encoder, the sequence of bit allocations to successive frames ensures that an assigned channel bit rate is maintained and that decoder buffer exceptions (overflow or underflow of decoder buffer) are avoided. In the case of a VBR encoder, the constraints are reduced. It may only be necessary to insure that a maximum channel rate is not exceeded so as to avoid decoder buffer underflow.
In order to prevent a decoder buffer exception, the encoder maintains a model of the decoder buffer. This model maintained by the encoder is known as the video buffer verifier (VBV) buffer. The VBV buffer models the decoder buffer occupancy. Depending on the VBV occupancy level, the number bits which may be budgeted for a particular frame may be increased or decreased to avoid a decoder buffer exception.
It is an object of the present invention to provide a rate control technique for an MPEG-2 compliant encoder.
Specifically, it is an object of the invention to provide a rate control technique for a constant bit rate, real time MPEG-2 compliant encoder.
It is also an object of the invention to provide a rate control technique for a variable bit rate, non-real time MPEG-2 compliant encoder.