Video and image compression are essential technologies in a video oriented world. The need for video compression occurs in a wide range of applications to reduce the bandwidth required to transmit video and to reduce the amount of storage capacity required to store video.
Several standard compression algorithms have been proposed. The Joint Photographic Experts Group (JPEG) standard specifies a method for compressing images. The primary mechanisms of compression are frequency dependent quantization of DCT coefficients, and the subsequent Huffman coding of non-zero quantized coefficients. The CCITT H.261 standard proposed for video telephony is similar to JPEG in that DCT coefficients are quantized and coded with a Huffman coding algorithm. The primary difference between JPEG and H.261 is the use of motion compensation temporal prediction. Essentially, instead of coding the image, temporal prediction errors are coded. For video storage, the Motion Picture Expert's Group (MPEG) standard has been proposed. MPEG and H.261 are similar and MPEG provides greater flexibility and the ability to achieve greater compression with the same quality. The present invention is applicable to encoders using all of these standards. For purposes of clarity, the invention will be explained with reference to an H.261 encoder.
The H.261 encoder produces a bit stream at px64K bits/sec. As indicated above, the H.261 encoder uses a hybrid video compression technique including predictive coding, transform coding, and entropy coding to reduce the output data rate. The H.261 encoder is required to output a bit stream at a constant rate for real time communication applications.
The H.261 encoder performs operations on blocks or macroblocks of pixels. Each block is 8.times.8 pixels and each macroblock comprises four luminance blocks and two chrominance blocks.
The H.261 encoder has two modes of operation. The intraframe mode realizes compression within one frame in the spatial dimension only. Compression in the temporal direction is disabled. This mode is used for coding the first frame in a scene and for resetting the prediction loop. The interframe mode, in addition to compression within one frame, also realizes compression between consecutive frames. The previous frame is motion compensated and then used as a prediction for the current frame.
A conventional H.261 encoder 10 is illustrated in FIG. 1. A frame of video to be encoded is stored in the input frame buffer 12. If the frame is an intraframe, coding is as follows. Each block in the frame buffer 12 is transmitted to the Discrete Cosine Transform (DCT) circuit 14 (no subtraction takes place in the subtractor 16 as the input to the subtractor 16 on line 17 is zero when intramode coding is used). The DCT coefficients outputted by the DCT circuit 14 are then quantized by the quantizer 18. The block of quantized transform coefficients outputted by the quantizer 18 is then zig-zag scanned using the scanning pattern shown in FIG. 2. The quantized transform coefficients then undergo run-level conversion using the run-level converter 20. The resulting run-level pairs are then coded by the variable length coder 22 and stored in the output buffer 24.
The output buffer 24 is used because the bit rate output of the variable length coder 22 is inherently variable. Using a control mechanism described below, the output buffer outputs a constant rate bit stream on the channel 26 for real time communications.
The quantization step size of the quantizer 18 is the same for all transform coefficients in a macroblock. However, the quantization step size can be changed from one macroblock to the next. The quantization step size is normally controlled by the amount of space left in the output buffer 24. The output buffer controller 28 senses the amount of space left in the buffer 24 and sends a feedback signal to the quantizer 18 (as well as the inverse quantizer 30) to control the step size. When the buffer 24 has excess capacity, the quantization step size can be decreased in order to increase the amount of code bits to obtain a better quality reconstructed image.
On the other hand, when the-buffer 24 is nearly full, the quantization step size can be increased to reduce the amount of code bits at the expense of picture quality.
When the frame to be coded is an interframe, coding proceeds as follows. For each macroblock in the interframe, a decision is made whether to perform intra or intermode processing. The decision is made by the inter/intra decision circuit 40. The decision to perform inter or intra mode processing for a macroblock is made on the basis of the energies of the luminance prediction error and the original luminance signal. The original luminance signal is transmitted to the decision circuit 40 from the input frame 12. The luminance prediction is transmitted to the decision circuit from the loop filter 42. In general, intramode coding is used if the original luminance signal has less AC energy than the luminance prediction error has total energy. However, if the prediction error has sufficiently small energy, then intermode coding is used for the macroblock. If the decision circuit 40 decides to use intramode processing for a macroblock, the multiplexer 44 outputs a zero to the input 17 of subtractor 16 and the coding proceeds as described above for a block of an intraframe (i.e., DCT, quantization, run-level conversion, variable length coding). If the decision circuit 40 decides to use intermode processing, a prediction is outputted by the multiplexer 44 to the input 17 of the subtractor. The prediction is subtracted from the original signal using the subtractor 17 and the residues are then coded using DCT circuit 14, quantizer 18, run-level converter 20, and variable length coder 22. The coded residue transform coefficients are stored in the output buffer 24 whose contents is used to control the quantization step size using the feedback mechanism described above.
The prediction used for the inter/intra decision and for intermode coding is a motion compensated prediction. This prediction is obtained as follows. The previous frame is stored in the previous frame memory 50. The motion estimation circuit 52 receives a block of pixels of the current frame from the input buffer 12. The motion estimation circuit 52 also receives a corresponding search area in the previous frame from the previous frame memory 50. The displacement of the current block in the search area which results in the best match is outputted by the motion estimation circuit as the motion vector. When intermode coding is used, the motion vector is transmitted to the variable length coder 22 for coding and transmission via the channel 26.
The motion vector is also transmitted to the motion compensation circuit 54 which accesses from the frame memory a motion compensated prediction for the current block. The loop filter 42, which also receives the motion vector, removes artifacts associated with the motion compensation.
The encoder 10 of FIG. 1 includes a decoder 31 for generating the pixel values stored in the previous frame memory 50. The decoder 31 includes the inverse quantizer circuit 30 and the inverse DCT circuit 31. When intramode processing is used, the quantized transform coefficients generated by the quantizer 18 are inverse quantized by the inverse quantizer 30. Then the Inverse Discrete Cosine Transform circuit 32 is used to reconstruct the original pixel values. When intramode processing is used, the multiplexer 44 outputs a zero so there is a zero at the input 33 of the adder 34. Thus, the reconstructed original pixel values are transferred directly to the previous frame memory 50. When intermode processing is used, the quantized residual transform coefficients outputted by the quantizer 18 are processed by the inverse quantizer 30 and the inverse DCT circuit 32 to generate reconstructed residual pixel values. The reconstructed residual pixel values are added to the motion compensated prediction using adder 33 to obtain reconstructed original pixel values which are stored in the previous frame memory 50. The loop delay 34 is provided to compensate for the delays of the coding/decoding loop.
The H.261 decoder is described in detail in Ruetz et al "High-Performance Full-Motion Video Compression Chip Set", IEEE Trans. on Circuits and Systems for Video Technology, Vol. 2, No. 2, June 1992, pp 111-121; Fujiwara et all "An All-ASIC Implementation of a Low Bit Rate Video Codec" IEEE trans on Circuits and Systems for Video Technology, Vol. 2, NO. 2, pp 123-133; and CCITT Video Compression Chipset Technical Note, LSI Logic Corporation 1991. The contents of these references are incorporated herein by reference.
In the H.261 encoder discussed above, run-level conversion is performed after quantization. The predictive residual data or original pixel data is first transformed by the DCT circuit and the DCT coefficients are quantized by a quantizer which is dynamic. That is, the quantization step size of the quantizer 18 (and inverse quantizer 30) is dynamically controlled by the output buffer controller 28. The output buffer controller 18 updates the quantization step according to the status of the output buffer. Thus, there is a feedback loop for dynamic quantization and output buffer control.
However, this feedback mechanism exhibits certain disadvantages. First, there is a latency in the adaptation of the quantization step size due to the feedback loop. The buffer status is only known with the delay of the run-level conversion and variable length coding. Thus, a more accurate buffer control mechanism should estimate the future buffer status rather than obtain the current buffer status.
In addition, most of the quantization is redundant. As the output rate of the H.261 encoder is limited by the communication channel, most of the DCT coefficients need not be transmitted and will be abandoned. However, the quantization is accomplished in advance so that each of the DCT coefficients is quantized no matter what the buffer status is. Most of these quantized coefficients are redundant and not transmitted at all.
Specifically, a typical image sequence in QCIF comprising 30 frames was encoded using a simulation of the encoder of FIG. 1 to achieve an output bit rate of 384 kbps. In the 30 frames, there are totally 17820 blocks. The busiest block has 21 run-level pairs. Moreover, there are 7905 blocks without any run-level pairs generated. Thus, nearly fourty-two percent of the blocks need not be quantized at all. There are a total of only 40429 run-level pairs for the 17820 blocks, an average of about 2.27 run-level pairs per block. Each block has 64 DCT coefficients, but only 2-3 coefficients on average really need to be quantized. Thus, the quantization of 61-62 coefficients per block will be redundant if the quantization is done before run-level conversion. This is about 95% redundancy.
Moreover, the conventional encoder, such as the H.261 encoder of FIG. 1, usually requires separate hardware for the quantizer 18 and inverse quantizer 30. Because every DCT coefficient is quantized and inverse quantized, these circuits must operate at the pixel rate so that speed becomes critical. For example, 30 frames/second CIF format video requires 4.56M quantization operations per second.
Accordingly, it is an object of the invention to provide an encoder which overcomes the shortcomings of the conventional (e.g. H.261) encoder. Specifically, it is an object of the invention to provide an encoder which eliminates or reduces the need for the feedback loop used to control the quantization step size. It is also an object of the invention to provide an encoder which eliminates the redundant quantization, thereby reducing the speed requirements on the quantizer and inverse quantizer, and thereby permitting a software implementation.