The Moving Picture Experts' Group (MPEG) MPEG-2 Standard is a compression/decompression standard for video applications. The standard describes an encoded and compressed datastream that has substantial bandwidth reduction. The compression is a subjective loss compression followed by a lossless compression. The encoded, compressed digital video data is subsequently decompressed and decoded in an MPEG-2 Standard compliant decoder.
The MPEG-2 Standard is described in, e.g., C. A. Gonzales and E. Viscito , "Motion Video Adaptive Quantization In The Transform Domain," IEEE Trans Circuits Syst Video Technol, Volume 1, No. 4, Dec. 1991, pp. 374-378, E. Viscito and C. A. Gonzales, "Encoding of Motion Video Sequences for the MPEG Environment Using Arithmetic Coding," SPIE, Vol. 1360, pp. 1572-1576, (1990), D. LeGall, "MPEG: A Video Compression Standard for Multimedia Applications," Communications of the ACM, Vol. 34, No. 4, (April 1991), pp. 46-58, S. Purcell and D. Galbi, "C Cube MPEG Video Processor," SPIE, v. 1659, (1992) pp. 24-29, and D. J. LeGall, "MPEG Video Compression Algorithm," Signal Process Image Commun, v. 4, n. 2, (1992), pp. 129-140, among others.
The MPEG-2 Standard specifies a datastream from and a decoder for a very high compression technique that achieves overall image datastream compression not achievable with either intraframe coding alone or interframe coding alone, while preserving the random access advantages of pure intraframe coding. The combination of block based frequency domain intraframe encoding and interpolative/predictive interframe encoding of the MPEG-2 Standard result in a balance between intraframe encoding alone and interframe encoding alone.
The MPEG-2 Standard exploits temporal redundancy for motion compensated interpolative and predictive encoding. That is, the assumption is made that "locally" the current picture can be modeled as a translation of the picture at a previous and/or future time. "Locally" means that the amplitude and direction of the displacement are not the same everywhere in the picture.
The MPEG-2 Standard specifies predictive and interpolative interframe encoding and frequency domain intraframe encoding. It has block based motion compensation for the reduction of temporal redundancy, and block based Discrete Cosine Transform based compression for the reduction of spatial redundancy. Under the MPEG-2 Standard motion compensation is achieved by predictive coding, interpolative coding, and Variable Length Coded motion vectors. The information relative to motion is based on a 16.times.16 array of pixels and is transmitted with the spatial information. Motion information is compressed with Variable Length Codes, such as Huffman codes.
The MPEG-2 Standard provides temporal redundancy reduction through the use of various predictive and interpolative tools. This is illustrated in FIG. 1. FIG. 1 shows three types of frames or pictures, "I" Intrapictures, "P" Predicted Pictures, and "B" Bidirectional Interpolated Pictures. Note that for interframe encoding, as IP and IPB encoding, picture transmission order is not the same as picture display order.
Motion compensation goes to the redundancy between pictures. The formation of P Predicted Pictures from I Intrapictures and of B Bidirectional Coded Pictures from a pair of past and future pictures is a key feature of the MPEG-2 Standard technique.
The "I" Intrapictures provide moderate compression, and are access points for random access, e.g., in the case of video tapes or CD ROMS. As a matter of convenience, one "I" Intrapicture is provided approximately every half second that is, every ten to twenty pictures. The "I" Intrapicture only gets information from itself. It does not receive information from an "P" Predicted Picture or "B" Bidirectional Interpolated Picture. Scene cuts preferably occur at "I" Intrapictures.
"P" Predicted Pictures are coded with respect to a previous picture. "P" Predicted Pictures are used as the reference for future pictures, both "P" and "B" pictures.
"B" Bidirectional Coded pictures have the highest degree of compression. They require both a past picture and a future picture for reconstruction. "B" bidirectional pictures are never used as a reference.
The motion compensation unit under the MPEG-2 Standard is the Macroblock unit. The MPEG-2 Standard Macroblocks are 16.times.16 pixels. Motion information consists of one vector for forward predicted macroblocks, one vector for backward predicted macroblocks, and two vectors for bidirectionally predicted macroblocks. The motion information associated with each macroblock is coded differentially with respect to the motion information present in the reference macroblock. In this way a macroblock of pixels is predicted by a translation of a macroblock of pixels from a past or future picture.
The difference between the source pixels and the predicted pixels is included in the corresponding bit stream. The decoder adds a correction term to the block of predicted pixels to produce the reconstructed block.
As described above and illustrated in FIG. 1, each macroblock of a "P" Predicted Picture can be coded with respect to the closest previous "I" Intrapicture, or with respect to the closest previous "P" Predicted Picture.
Further, as described above and illustrated in FIG. 1, each macroblock of a "B" Bidirectional Picture can be coded by forward prediction from the closest past "I" or "P" Picture, by backward prediction from the closest future "I" or "P" Picture, or bidirectionally, using both the closest past "I" or "P" picture and the closest "future "I" or "P" picture. Full bidirectional prediction is the least noisy prediction.
Motion information is sent with each macroblock to show what part of the reference picture is to be used as a predictor.
As noted above, motion vectors are coded differentially with respect to motion vectors of the previous adjacent block. Variable Length Coding is used to code the differential motion vector so that only a small number of bits are needed to code the motion vector in the common case, where the motion vector for a macroblock is nearly equal to the motion vector for a preceding macroblock.
Spatial redundancy is the redundancy within a picture. Because of the macroblock based nature of the motion compensation process, described above, it was desirable for the MPEG-2 Standard to use a block based method of reducing spatial redundancy. The method of choice is the Discrete Cosine Transformation, and Discrete Cosine Transform coding of the picture. Discrete Cosine Transform coding is combined with weighted scalar quantization and run length coding to achieve still further levels of compression.
The Discrete Cosine Transformation is an orthogonal transformation. Orthogonal transformations, because they have a frequency domain interpretation, are filter bank oriented. The Discrete Cosine Transformation is also localized. That is, the encoding process samples on an 8.times.8 spatial window which is sufficient to compute 64 transform coefficients or sub-bands.
Another advantage of the Discrete Cosine Transformation is that fast encoding and decoding algorithms are available. Additionally, the sub-band decomposition of the Discrete Cosine Transformation is sufficiently well behaved to allow effective use of psychovisual criteria.
After Discrete Cosine transformation, many of the higher frequency coefficients are zero. These coefficients are organized into a zig-zag, as shown in FIG. 2, and converted into run-amplitude (run-level) pairs. Each pair indicates the number of zero coefficients and the amplitude of the non-zero coefficient. This is coded in a Variable Length Code.
Discrete Cosine Transformation encoding is carried out in the three stages as shown in FIG. 2. The first stage is the computation of the Discrete Cosine Transformation coefficients. The second step is the quantization of the coefficients. The third step is the conversion of the quantized transform coefficients into {run-amplitude} pairs after reorganization of the data into zig-zag scanning order.
Quantization can be viewed as a shift right by several bits. Quantization enables very high degrees of compression, and a high output bit rate, and retains high picture quality.
Quantization can be adaptive, with "I" Intrapictures having fine quantization to avoid "blockiness" in the reconstructed image. This is important because "I" Intrapictures contain energy at all frequencies. By way of contrast, "P" and "B" pictures contain predominantly high frequency energy and can be coded at a coarser quantization.
One challenge facing decoder designers is the accommodation of a single decoder system to a variety of display output formats, while complying fully with luminance/chrominance relationships and the MPEG2 standard.
The displayed output of the decoder chip must conform to CCIR recommendation 601. This specifies the number of luminance and chrominance pixels in a single active line, and also how the chrominance pixels are subsampled relative to the luminance signals. The format defined as 4:2:2 is supported in most cases in the industry. This defines 720 active luminance signals (Y), and 360 color difference signals (Cb, Cr pairs), where each line of luminance signals has corresponding line of chrominance signals. CCIR recommendation 656 goes on to define the number of active lines for NTSC and PAL environments as 480 and 576, respectively.
The output of the decoder chip is decoded digital video information which is stored in the external memory area in frame buffers. In order to properly decode and display the digital video information, four frame buffers have heretofore been required:
The Decompression Frame (currently being decoded), PA1 The Past Reference Frame, PA1 The Future Reference Frame, and PA1 The Display Frame (currently being displayed). PA1 The Decompression Frame (currently being decoded), PA1 The Past Reference Frame, PA1 The Future Reference Frame, and PA1 The Display Frame (currently being displayed).
Each buffer must be large enough to hold a complete picture's worth of digital video data (720.times.480 pixels for MPEG-2 Main Profile/Main Level). In order to keep the cost of the video decoder products down, an important goal has been to reduce the amount of external memory required to support the decode function. The MPEG-2 decoder function can operate with 1 Megabyte, 2 Megabyte, and 4 Megabyte DRAM configurations. However, it is desirable to reduce the required amount of DRAM.
It is desirable to switch between sequences that are of different size resolution seamlessly, without introducing unwanted noise or delay. Noise can be introduced by reallocating memory before the last picture of prior sequence is played. Delay would occur if reallocation occurs after the last picture of the prior sequence is played.