The present invention relates generally to techniques for performing integer arithmetic, and, more particularly, for performing quantization and prediction calculations in video encoders and decoders.
In video communication (e.g., television, video conferencing, streaming media, etc.), a stream of video frames are transmitted over a transmission channel to a receiver. Depending on the particular application, audio information associated with the video may also be transmitted. Video data is generally voluminous. For example, typical television images have spatial resolution of approximately 720×480 pixels per frame. If 8 bits are used to digitally represent a pixel, and if the video is to be transmitted at 30 frames per second, then a data rate of approximately 83 Mbits per second would be required. However, the bandwidth of transmission channels are typically limited. Thus, the transmission of raw digital video data in real-time is generally not feasible. Similarly, the storage of raw digital video data is prohibitive because the amount of memory for storage is typically limited.
Consequently, video data is generally compressed prior to transmission and/or storage. Various standards for video compression have emerged, including H.261, MPEG-1, MPEG-2, MPEG-4, H.263, and the like. Compression techniques generally exploit the redundancy of information, both within each picture of a stream of video and between pictures in the stream. For example, one commonly used technique for compressing video data involves performing a mathematical transform (e.g., discrete cosine transform) on the picture data, which transforms the picture data into the 2-dimensional spatial frequency domain. Then, the transformed picture data is quantized (i.e., the resolution of the data is reduced so that less bits are required to represent the data), taking advantage of the fact that human sight is generally less sensitive to higher spatial frequencies (i.e., transformed picture data corresponding to higher spatial frequencies are more severely quantized than transformed video data corresponding to lower spatial frequencies). At the receiver, the inverse transform is applied to the received video data to regenerate the video.
In another common technique, rather than transmitting a new picture in the video stream, the difference between the new picture and a previous picture is transmitted. Because successive pictures in a video stream are often similar, the difference information can be transmitted using much less bits than would be required to transmit the picture itself.
The number of bits required to transmit video can be further reduced using prediction techniques at the encoder and decoder. For instance, the encoder can “predict” a current picture in the video stream based on a previous picture, and then calculate the error between its prediction and the actual picture. The error between a predicted picture and the actual picture will tend to be smaller than the error between the actual picture and a previous picture. Because the error is smaller, less bits are needed to represent the error, thus, reducing the amount of bits that need to be transmitted. At the receiver, a decoder generates a predicted picture and combines it with the received error information to generate the actual picture.
One technique for generating a prediction of a picture in a video stream involves motion estimation. In one motion estimation technique, a current picture is partitioned into 8-by-8 blocks of pixels. For each block, a best fit to the block is searched for within a reference picture, such as, for example, another actual or predicted picture in the video stream that is adjacent to the current picture. Once a best fit is found, a motion vector is determined that basically indicates where in the reference picture the best fit block is located. Then, the motion vector and errors for each block of the frame are transmitted to the receiver. At the receiver, the current picture is reconstructed using the reference picture, the motion vectors and the error information.
Techniques similar to those described above, as well as other techniques, can be combined to achieve greater degrees of compression without reducing video quality beyond a desired level. For example, in the MPEG-1, MPEG-2, and MPEG-4 standards, pictures in the video stream are predicted, and the difference between the actual picture and the predicted picture are calculated. Then, the discrete cosine transform (DCT) of the difference is calculated, and the DCT coefficients are quantized.
In typical video systems, video data are represented and processed as integers. What is needed are more efficient techniques for processing fixed-point data.