With the advent of computer networks, the storage and transmission of multimedia content has become commonplace. In this environment, a number of compression techniques and standards have emerged to reconcile data-intensive media such as audio and video with the typically limited storage capacity of computers, and with the typically limited data rates for networks.
One such standard for digital audio/video compression has been developed by the Moving Pictures Expert Group (MPEG) of the International Standards Organization. This standard was first promulgated as MPEG-1, and has undergone several revisions named MPEG-2 (broadcast quality video in a four-megabit-per-second channel), MPEG-3 (conceived as a standard for high-definition television, now canceled), and MPEG-4 (medium-resolution videoconferencing with low frame rates in a sixty-four-kilobit-per-second channel). These standards are collectively referred to herein as MPEG.
MPEG employs single-frame compression based upon a two-dimensional discrete cosine transform (“DCT”), and quantization of the resulting transform coefficients. In this respect, it resembles the Joint Photographic Experts Group (“JPEG”) still image compression standard. The MPEG standard provides further compression based upon temporal redundancy.
The MPEG standard is complex, particularly in view of the Constrained Parameter Bitstream (CPB) profile, which further defines the MPEG standard to ensure compatibility among particular implementations. However, since MPEG achieves high compression ratios, it is widely used. Even with the CPB profile, MPEG provides a significant amount of design flexibility. While the flexibility of MPEG has led attention to be focused on methods for achieving greater compression ratios in the video stream, and on ensuring that the video stream can be decoded at an adequate frame rate, there remains significant room for improvement at the encoding end of MPEG systems.
The known basic scheme is to predict motion from frame to frame in the temporal direction, and then to use DCTs (Discrete Cosine Transforms) to organize any redundancy in the spatial directions. The DCTs may be done on 8×8 blocks, and the motion prediction is done in the luminance (Y) channel on 16×16 blocks. In other words, given the 16×16 block in the current frame that is intended to be coded, the object is to look for a close match to that block in a previous or future frame (there are backward prediction modes where later frames are sent first to allow interpolating between frames). The DCT coefficients (of either the actual data or the difference between this block and the close match) are quantized, which means that they are divided by some value to drop bits off the lower end. Hopefully, many of the coefficients will then end up being zero. The quantization can change for every macroblock (a macroblock is 16×16 of Y and the corresponding 8×8's in both U and V). The result of all of this, which includes the DCT efficients, the motion vectors and the quantization parameters is Huffman coded preferably using fixed tables. The DCT coefficients have a special Huffman table that is two-dimensional in that one code specifies a run-length of zeros and the other, a non-zero value that ended the run.
As known in the art, there are three types of coded frames. There are I or intra frames. They are simply a frame coded as a still image, not using any past history. Then there are P or predicted frames. They are predicted from the most recently reconstructed I or P frame. Each macroblock in a P frame can either come with a vector and difference DCT coefficients for a close match in the last I or P, or it can just be intra coded (like in the I frames) if there was no good match.
Lastly, there are B (bi-directional) frames. They are predicted from the closest two I or P frame, one in the past and one in the future. It is desirable to search for matching blocks in those frames, and try different comparisons, e.g., the forward vector, the backward vector, and try averaging the two blocks from the future and past frames, and subtracting that from the block being coded. If none of those will work, the block may be intra coded.
In particular, the quantized discrete cosine transform (DCT) coefficients of an eight-by-eight MPEG block are typically sparse, that is, a large percentage of blocks contain fewer than five significant coefficients. This is particularly true of inter-coding, where a current block is derived from previous or future blocks. Inter-coded frames frequently contain no significant coefficients whatsoever, yet a conventional MPEG encoder performs all of the DCT, quantization, dequantization, and inverse DCT steps on these blocks in the same manner as on other blocks.