The present invention relates to decompression of digital information and, more particularly, to decompression of digital video information.
Video information requires a large amount of storage space, therefore video information is generally compressed before it is stored. Accordingly, to display compressed video information which is stored, for example, as a compressed video bitstream on a compact disk read only memory (CD ROM), the compressed video information is decompressed to furnish decompressed video information. The decompressed video information is then communicated to a display. The video information is generally stored in a plurality of memory storage locations corresponding to pixel locations on a display. The stored video information is generally referred to as a bit map. The video information representing a single screen of information on a display is called a frame. A goal of many video systems is to quickly and efficiently decode compressed video information to enable a motion video capability.
Standardization of recording media, devices and various aspects of data handling, such as video compression, is highly desirable for continued growth of this technology and its applications. One compression standard which has attained wide spread use for compressing and decompressing video information is the Moving Pictures Expert Group (MPEG) standard for video encoding and decoding. The MPEG standard is defined in International Standard ISO/IEC 11172-1, "Information Technology--Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s", Parts 1, 2 and 3, First edition Jan. 8, 1993 which is hereby incorporated by reference in its entirety.
Frames within the MPEG standard are divided into 16.times.16 pixel macroblocks. Each macroblock includes six 8.times.8 blocks: four luminance (Y) blocks, one chrominance red (C.sub.r) block and one chrominance blue (C.sub.b) block. The luminance blocks correspond to sets of 8.times.8 pixels on a display and control the brightness of respective pixels. The chrominance blocks control the color of the frame. For each set of four pixels on the display, there is a single C.sub.r characteristic and a single C.sub.b characteristic.
For example, referring to FIG. 1, labeled prior art, a frame presented by a typical display includes 240 lines of video information in which each line has 352 pixels. Accordingly, a frame includes 240.times.352=84,480 pixel locations. Under the MPEG standard, this frame of video includes 44 by 30 luminance blocks or 1320 blocks of luminance video information. Additionally, because each macroblock of information also includes two corresponding chrominance blocks, each frame of video information also includes 330 C.sub.r blocks and 330 C.sub.b blocks. Accordingly, each frame of video information requires 126,720 pixels or a total of 1,013,760 bits of bit-mapped storage space for presentation on a display.
There are three types of frames of video information which are defined by the MPEG standard: intra-frames (I frame), forward predicted frames (P frame) and bi-predicted frames (B frame). An example frame sequence is shown in FIG. 2, labelled prior art, which represents one of but many possible frame sequences supported by MPEG.
An I frame, such as I-frame 20, is encoded as a single image having no reference to any past or future frame (with one minor exception not important for this discussion). Each block of an I frame is encoded independently. Accordingly, when decoding an I frame, no motion processing is necessary. However, for the reasons discussed below, it is necessary to store and access I frames for use in decoding other types of frames.
A P frame, such as P-frame 24, is encoded relative to a past reference frame. A reference frame is a P or I frame. The past reference frame is the closest preceding reference frame. For example, P-frame 24 is shown as referring back to I-frame 20 by reference arrow 29, and thus, I-frame 20 is the past reference frame for P-frame 24. P-frame 28 is shown as referring back to P-frame 24 by reference arrow 30, and thus, P-frame 24 is the past reference frame for P-frame 28. Each macroblock in a P frame can be encoded either as an I macroblock or as a P macroblock. A P macroblock is stored as a translated 16.times.16 area of a past reference frame plus an error term. To specify the location of the P macroblock, a motion vector (i.e., an indication of the relative position of the macroblock in the current frame to the position of the translated area in the past reference frame) is also encoded. When decoding a P macroblock, the 16.times.16 area from the reference frame is offset according to a motion vector. The decoding function accordingly includes motion compensation, which is performed on a macroblock, in combination with error (IDCT) terms, which are defined on a block by block basis.
A B frame (e.g., B-frames 21, 22, 23, 25, 26, and 27) is encoded relative to the past reference frame and/or a future reference frame. The future reference frame is the closest proceeding reference frame (whereas the past reference frame is the closest preceding reference frame). Accordingly, the decoding of a B-frame is similar to that of a P frame with the exception that a B frame motion vector may refer to areas in the future reference frame. For example, B-frame 22 is shown as referring back to I-frame 20 by reference arrow 31, and is also shown as referring forward to P-frame 24 by reference arrow 32. For macroblocks that use both past and future reference frames, the two 16.times.16 areas are averaged and then added to blocks of pixel error terms. The macroblocks from the reference frames are offset according to motion vectors.
Frames are coded using a discrete cosine transform (DCT) coding scheme which transforms pixels (or error terms) into a set of coefficients corresponding to amplitudes of specific cosine basis functions. The discrete cosine transform is used in image compression to decorrelate picture data prior to quantization. The DCT coefficients are further coded using variable length coding. Variable length coding (VLC) is a statistical coding technique that assigns codewords to values to be encoded. Values having a high frequency of occurrence are assigned short codewords, and those having infrequent occurrence are assigned long codewords. On the average, the more frequent shorter codewords dominate so that the code string is shorter than the original data.
The above described scheme using I, P, and B frames and motion vectors is often referred to as motion compensation. The pixel error terms are coded via the discrete cosine transform (DCT), quantization, and variable-length coding (VLC). Motion compensation is one of the most computationally intensive operations in many common video decompression methods. When pixels change between video frames, this change is often due to predictable camera or subject motion. Thus, a macroblock of pixels in one frame can be obtained by translating a macroblock of pixels in a previous or subsequent reference frame. The amount of translation is referred to as the motion vector. Moreover, as mentioned earlier, compression methods such as MPEG employ bi-directional motion compensation (B blocks) wherein a macroblock of pixels in the current frame is computed as the average or interpolation of a macroblock from a past reference frame and a macroblock from a future reference frame. Both averaging and interpolation are computationally intensive operations which require extensive processor resources.
Moreover, VLC decoding is also a particularly processor-intensive operation. Such decoding demands severely burden the video system processor when implemented in a general purpose processor system. Systems unable to keep up with the computational demands of decoding the compressed video information frequently drop entire frames to resynchronize with a real time clock signal also encoded in the video stream. Otherwise, video signals would become out of synchronization with audio signals, and/or the video playback would "slow down" compared to the "real" speed otherwise intended. This is sometimes observable as a momentary freeze of the picture in the video playback, followed by sudden discontinuities or jerkiness in the picture.
Many video compression algorithms are designed to optimize compression ratio (or the compression efficiency) and little attention is paid to enabling playback on a machine having a lower performance level. For example, MPEG 1 was developed to enable storage and playback of 352.times.240 resolution video frames from then-available (today's so-called "single speed") CD-ROM drives which support transfer rates of 150 Kbytes/second. While MPEG 1 enables playback from a CD-ROM, a processor with a relatively high performance level is required to decode and display the video information. Thus, the playback rate and/or the quality of MPEG video varies greatly depending upon the processor being used. Consequently, a significant need exists for enabling high quality playback of compressed video information using processors having less than stellar performance levels.