Advances in audio and video compression and decompression techniques, together with very large scale integration technology, have enabled the creation of new capabilities and markets. These include the storage of digital audio and video in computers and on small optical discs, as well as the transmission of digital audio and video signals from direct broadcast satellites.
Such advances were made possible, in part, by international standards which provide compatibility between different approaches to compression and decompression. One such standard is known as "JPEG," for Joint Photographic Expert Group. A later developed standard is known as "MPEG 1." This was the first set of standards agreed to by the Moving Pictures Expert Group. Yet another standard is known as "ITU-T H.261", which is a video compression standard particularly useful for video teleconferencing. Although each standard is designed for a specific application, all of the standards have much in common.
MPEG 1 was designed for storing and distributing audio and motion video, with emphasis on video quality. Its features include random access, fast forward and reverse playback. MPEG 1 serves as the basis for video CD's and for many video games. The original channel bandwidth and image resolution for MPEG 1 were established based upon the then available recording media available. The goal of MPEG 1 was the reproduction of recorded digital audio and video using a 12 centimeter diameter optical disc with a bit rate of 1.416 Mbps, 1.15 Mbps of which is allocated to video.
The compressed bit streams generated under the MPEG 1 standard implicitly define the decompression algorithms to be used for such bit streams. The compression algorithms, however, can vary within the specifications of the MPEG 1 standard, thereby allowing the possibility of a proprietary advantage in regard to the generation of compressed bit streams.
A later developed standard known as "MPEG 2" extends the basic concepts of MPEG 1 to cover a wider range of applications. Although the primary application of the MPEG 2 standards is the all digital transmission of broadcast-quality video at bit rates of 4 Mbps to 9 Mbps, it appears that the MPEG 2 standard may also be useful for other applications, such as the storage of full length motion pictures on 12 centimeter diameter optical discs, with resolution at least as good as that presently provided by 12 inch diameter optical discs.
The MPEG 2 standard relies upon three types of coded pictures. I ("intra") pictures are fields or frames coded as a stand-alone still image. Such I pictures allow random access points within a video stream. As such, I pictures should occur about two times per second. I pictures should also be used where scene cuts (such as in a motion picture) occur.
P ("predicted") pictures are fields or frames coded relative to the nearest previous I or P picture, resulting in forward prediction processing. P pictures allow more compression than I pictures, through the use of motion compensation, and also serve as a reference for B pictures and future P pictures.
B ("bidirectional") pictures are fields or frames that use the closest past and future I or P picture as a reference, resulting in bidirectional prediction. B pictures provide the most compression and increased signal to noise ratio by averaging two pictures. The theory behind I, P and B pictures are more thoroughly described in U.S. Pat. Nos. 5,386,234 and 5,481,553 assigned to Sony Corporation, which are incorporated herein by reference in their entirety.
A group of pictures ("GOP") is a series of one or more coded pictures which assist in random accessing and editing. A GOP value is configurable during the encoding process. Since the I pictures are closer together, the smaller the GOP value, the better the response to movement. The level of compression is, however, lower.
In a coded bitstream, a GOP must start with an I picture and may be followed by any number of I, P or B pictures in any order. In display order, a GOP must start with an I or B picture and end with an I or P picture. Thus, the smallest GOP size is a single I picture, with the largest size unlimited.
FIG. 1 is a block diagram illustrating a video decoder system 100, including a decoder 101. A coded bitstream 102 is input to a variable-length decoder (VLD) 104 of the decoder. The VLD 104 expands run/amplitude pairs of quantized frequency coefficients that are encoded into the bitstream 102. The frequency coefficients are then converted into the spatial domain using an inverse discrete cosine transform circuit 110. The resulting "error terms" indicate a content difference from a reference macroblock to another macroblock to be decoded (referred to herein as a "current macroblock").
Spatial differences from reference macroblocks to current macroblocks are encoded as two-dimensional motion vectors in the coded bitstream 102. Specifically, the two-dimensional motion vectors indicate movement from a reference macroblock to a current macroblock. In particular, a motion vector specifies where to retrieve a macroblock from a previously decoded frame (i.e., designates the "reference macroblock") to predict the pixel values of a current macroblock.
The error terms and the motion vectors are then provided to a motion compensation circuit 112. The motion compensation circuit 112 employs a reference macroblock and the error terms and motion vector for a current macroblock to predict the pixel values for the current macroblock. Once the pixel values for the current macroblock are determined, the current macroblock is stored into a display buffer memory 114. From the display buffer memory 114, the macroblocks are provided to a display circuit 116. The display circuit 116 may perform other display-related operations prior to actually displaying the decoded video. For example, the display circuit 116 may include circuitry for performing 420 to 422 conversion, letterbox conversion or other display-related operations.
Turning now specifically to the display function, the rate of consuming the macroblocks for display is regular. That is, the display operates synchronously to a display clock. However, as alluded to above, the order in which the pictures are encoded in the video bitstream 102 are not necessarily the order in which the pictures are to be displayed. Furthermore, MPEG 2 provides that the order in which the fields of a B picture are encoded are not necessarily the order in which the fields are to be displayed. In particular, a "top field" of a B picture frame may be provided before a "bottom field" of the same B picture frame, or vice versa. As a result, video decoder systems typically include a display buffer memory 114 that is large enough to hold three complete reconstructed and predicted pictures--an I picture, a P picture and a B picture.
However, memory for three complete pictures does not come without cost. Because modern video decoder systems are typically employed in portable apparatuses such as DVD players, it is expensive in terms of power and space to provide memory. Furthermore, it is desirable to free up display buffer memory so that other memory-intensive operations may utilize the freed-up memory.