The most formidable challenge to the acceptance of digital video technology lies in the large data size of video content. For example, a two-hour motion picture may use high resolution frames of 4000 pixels by 3000 pixels, for a total of 12,000,000 pixels per frame. For high picture quality, each pixel may comprise 10 bits of data in each of three color spaces, e.g. Red-Green-Blue (RGB), and, for improved color quality, there are plans to allocate even more bits per pixel. At 24 frames per second, the entire two-hour movie would require over 8 terabytes (8 trillion bytes), which is a substantial amount of storage. Accordingly, there has been much interest in compression techniques for video content.
Compression of video data typically exploits two types of redundancies: spatial and temporal. Reduction of spatial redundancy is achieved using transform coding, such as the discrete cosine transform (DCT), which works by decorrelating the input samples in every 8×8 block of each frame of video sequence. The coefficients are then zigzag scanned, quantized, and entropy encoded. Reduction of temporal redundancy, on the other hand, is achieved using motion-compensated predictive coding, in which the encoder estimates the motion between two frames by matching each block of the current frame with the previous frame. The residual frame after this matching step is then coded using DCT, and the motion vectors are coded as additional information. Major video coding standards such as MPEG-1, MPEG-2, MPEG-4, H.261, H.263 and H.263+ employ such a motion-compensated transform-based coding approach.
Frames that are coded without any reference to previously coded frames are referred to as “Intra frames” (or I-frames). I-frames exploit spatial redundancy only using a transform coding such as DCT. Frames that are coded using a previously coded frame are called “Inter” or “non-Intra” frames. Inter frames themselves can be of two types: (i) Predictive frames (P-frames), coded with respect to the immediately previous I-frame or P-frame; and (ii) Bidirectionally predictive frames (B-frames), coded with respect to the immediately previous I-frame or P-frame as well as the immediately next P-frame or I-frame. In a typical video coding scenario, I-frames are spaced a certain number of frames apart, with several P-frames and B-frames between two consecutive I-frames. The spacing between consecutive I-frames is referred to as the “I-frame distance.” The main purposes of introducing periodic I-frames is to allow easy editing on the compressed video bit-stream and resynchronization of a transmitted compressed video bit-stream in case one of the non-intra frames are accidentally dropped.
In a motion-compensated transform-based coder, motion vectors are first estimated (except in the case of an I-frame) and the estimated motion vectors and motion compensation modes are entropy-encoded using variable length coding. The motion-compensated residual frame (original frame in the case of an I-frame) then undergoes an 8×8 block-DCT transformation. The 8×8 block of DCT coefficients then undergo quantization, zigzag scanning, and a run-length followed by entropy encoding using variable length coding. Together, motion vectors, including motion compensation modes, and quantized DCT coefficients are used to reconstruct a lossy version of the original video sequence.
The variable-length coding for both motion vectors (and motion compensation modes) and quantized DCT coefficients is done using look-up tables akin to Huffman coding on a frame-by-frame basis. Separate variable code length look-up tables are provided under a few different conditions, for example, separate VLC tables are provided for intra-coded blocks and for inter-coded blocks in the case of quantized DCT coefficients. However, the number of different variable length coding tables is small, and moreover, since they are optimized over a large class of test video sequences, they are not necessarily close to optimal for specific video sequences.