1. Field of the Invention
The present invention relates to video processing and, more particularly, to motion compensation of compressed video information.
2. Description of Related Art
Because video information requires a large amount of storage space, video information is generally compressed. Accordingly, to display compressed video information which is stored, for example on a compact disk read only memory (CD ROM), the compressed video information must be decompressed to provide decompressed video information. The decompressed video information is then provided in a bit stream to a display. The bit stream of video information is typically stored in a plurality of memory storage locations corresponding to pixel locations on a display. The stored video information is generally referred to as a bit map. The video information required to present a single screen of information on a display is called a frame. A goal of many video systems is to quickly and efficiently decode compressed video information so as to provide motion video.
Standardization of recording media, devices and various aspects of data handling, such as video compression, is highly desirable for continued growth of this technology and its applications. One compression standard which has attained widespread use for compressing and decompressing video information is the Moving Pictures Expert Group (MPEG) standard for video encoding and decoding. The MPEG standard is defined in International Standard ISO/IEC 11172-1, "Information Technology--Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s", Parts 1, 2 and 3, First edition 1993-08-01 which is hereby incorporated by reference in its entirety.
Frames within the MPEG standard are divided into 16.times.16 pixel macroblocks. Each macroblock includes six 8.times.8 blocks: four luminance (Y) blocks, one chrominance red (C.sub.r) block and one chrominance blue (C.sub.b) block. The luminance blocks correspond to sets of 8.times.8 pixels on a display and control the brightness of respective pixels. The chrominance blocks to a large extent control the colors for sets of four pixels. For each set of four pixels on the display, there is a single C.sub.r characteristic and a single C.sub.b characteristic.
For example, referring to FIG. 1, labeled prior art, a frame presented by a typical display includes 240 lines of video information in which each line has 352 pixels. Accordingly, a frame includes 240.times.352=84,480 pixel locations. Under the MPEG standard, this frame of video includes 44 by 30 luminance blocks or 1320 blocks of luminance video information. Additionally, because each macroblock of information also includes two corresponding chrominance blocks, each frame of video information also includes 330 C.sub.r blocks and 330 C.sub.b blocks. Accordingly, each frame of video information requires 126,720 pixels=1,013,760 bits of bit mapped storage space for presentation on a display.
There are three types of frames of video information which are defined by the MPEG standard, intra-frames (I frame), forward predicted frames (P frame) and bidirectional-predicted frames (B frame). A sample frame sequence is depicted in FIG. 2, labelled prior art, which represents one of but many possible frame sequences supported by the MPEG standard.
An I frame, such as I frame 20, is encoded as a single image having no reference to any past or future frame (with one minor exception not important for this discussion). Each block of an I frame is encoded independently. Accordingly, when decoding an I frame, no motion processing is necessary. However, for the reasons discussed below, it is necessary to store and access I frames for use in decoding other types of frames.
A P frame, such as P-frame 24, is encoded relative to a past reference frame. A reference frame is a P or I frame. The past reference frame is the closest preceding reference frame. For example, P-frame 24 is shown as referring back to I-frame 20 by reference arrow 29, and thus, I-frame 20 is the past reference frame for P-frame 24. P-frame 28 is shown as referring back to P-frame 24 by reference arrow 30, and thus, P-frame 24 is the past reference frame for P-frame 28. Each macroblock in a P frame can be encoded either as an I macroblock or as a P macroblock. A P macroblock references a 16.times.16 area of a past reference frame, which may be offset by a motion vector, to which 16.times.16 motion-compensated area an error term is added (which, of course, may be zero for a given macroblock). The motion vector is also encoded which specifies the relative position of a macroblock within a reference frame with respect to the macroblock within the current frame. When decoding a P frame, the current P macroblock is formed by adding a 16.times.16 area from the reference frame to blocks of error terms.
B frames are frames which occur between two reference frames. There may be multiple B frames between a pair of reference frames. B frame macroblocks may be predicted from the past reference frame, the future reference frame, or by interpolating (averaging) a macroblock in the past reference frame with a macroblock in the future reference frame.
In more detail, a B frame (e.g., B-frames 21, 22, 23, 25, 26, and 27) is encoded relative to the past reference frame and a future reference frame. The future reference frame is the closest proceeding reference frame (whereas the past reference frame is the closest preceding reference frame). Accordingly, the decoding of a B-frame is similar to that of a P frame with the exception that a B frame motion vector may refer to areas in the future reference frame. For example, B-frame 22 is shown as referring back to I-frame 20 by reference arrow 31, and is also shown as referring forward to P-frame 24 by reference arrow 32. For macroblocks that use both past and future reference frames, the two 16.times.16 areas are averaged and then added to blocks of error terms. The macroblocks from each of the reference frames are offset according to respective motion vectors.
Frames are coded using a discrete cosine transform (DCT) coding scheme which encodes coefficients as an amplitude of a specific cosine basis function. The DCT coefficients are quantized and further coded using variable length encoding. Variable length coding (VLC) is a statistical coding technique that assigns codewords to values to be encoded. Values having a high frequency of occurrence are assigned short codewords, and those having infrequent occurrence are assigned long codewords. On the average, the more frequent shorter codewords dominate so that the code string is shorter than the original data.
A variety of MPEG frame sequences are possible in addition to that shown (I-B-B-B-P-B-B-B-P-B-B-B-P-B-B-B-I- . . . ) in FIG. 2. Possible alternate sequences include: I-P-P-P-I-P-P-P-I-P-P-P- . . . ; I-B-B-P-B-B-P-B-B-I- . . . ; I-I-I-I-I-I-I- . . . (known as full motion JPEG); and others. Each choice trades off picture fidelity against compression density against computational complexity.
The above described scheme using I, P and B frames and motion vectors is often referred to as motion compensation. The error terms are coded via the discrete cosine transform (DCT), quantization, and variable-length coding (VLC). Motion compensation is one of the most computationally intensive operations in many common video decompression methods. When pixels change between video frames, this change is often due to predictable camera or subject motion. Thus, a macroblock of pixels in one frame can be obtained by translating a macroblock of pixels in a previous or subsequent frame. The amount of translation is referred to as the motion vector. A motion vector is typically a full pixel or half-pixel resolution. When a motion vector has half pixel resolution, an averaging procedure or method is used at each pixel of the previous (or subsequent) frame to compute the motion compensated pixel of the current frame which is to be displayed.
Moreover, as mentioned earlier, compression methods such as MPEG employ bi-directional motion compensation (B blocks) wherein a macroblock of pixels in the current frame is computed as the average or interpolation of a macroblock from a past reference frame and a macroblock from a future reference frame. Both averaging and interpolation are computationally intensive operations which require extensive processor resources. Averaging and interpolation severely burden the system processor when implemented in a general purpose computer system since these operations require many additions and divisions for each pixel to be displayed.
Systems unable to keep up with the computational demands of such a decompression burden frequently drop entire frames to resynchronize with a real time clock signal also encoded in the video stream. Otherwise, video signals would become out of synchronization with audio signals, and/or the video playback would "slow down" compared to the "real" speed otherwise intended. This is sometimes observable as a momentary freeze of the picture in the video playback, followed by sudden discontinuities or jerkiness in the picture. Consequently, a significant need exists for reducing the processing requirements associated with decompression methods. While such increased efficiencies are needed, it is important that the quality of the resulting video image not be overly degraded.