The Moving Picture Experts' Group (MPEG) MPEG-2 Draft Standard is a compression/decompression standard for interactive video applications. The standard describes an encoding method that results in substantial bandwidth reduction by a subjective lossy compression followed by a lossless compression. The encoded, compressed digital video data is subsequently decompressed and decoded in an PEG-2 Draft Standard compliant decoder.
The MPEG-2 Draft Standard is described in, e.g., C.A. Gonzales and E. Viscito, "Motion Video Adaptive Quantization In The Transform Domain," IEEE Trans Circuits Syst Video Technol, Volume 1, No. 4, Dec. 1991, pp. 374-378, E. Viscito and C.A. Gonzales, "Encoding of Motion Video Sequences for the MPEG Environment Using Arithmetic Coding," SPIE, Vol 1360, pp. 1572-1576, (1990), D. LeGall, "MPEG: A Video Compression Standard for Multimedia Applications," Communications of the ACM, Vol. 34, No. 4, (April 1991), pp. 46-58, S. Purcell and D. Galbi, "C Cube MPEG Video Processor," SPIE, v. 1659, (1992) pp 24-29, and D. J. LeGall, "MPEG Video Compression Algorithm," Signal Process Image Commun, v. 4, n. 2, (1992), pp. 129-140, among others.
The MPEG-2 Draft Standard specifies a very high compression technique that achieves compression not achievable with intraframe coding alone, while preserving the random access advantages of pure intraframe coding. The combination of frequency domain intraframe encoding and interpolative/predictive interframe encoding of the MPEG-2 Draft Standard result in a balance between intraframe encoding alone and interframe encoding alone.
The MPEG-2 Draft Standard exploits temporal redundancy for motion compensated interpolative and predictive encoding. That is, the assumption is made that "locally" the current picture can be modelled as a translation of the picture at a previous and/or future time. "Locally" means that the amplitude and direction of the displacement are not the same everywhere in the picture.
MPEG-2 Draft Standard specifies predictive and interpolative interframe encoding and frequency domain intraframe encoding. It has block based motion compensation for the reduction of temporal redundancy, and Discrete Cosine Transform based compression for the reduction of spatial redundancy. Under MPEG-2 Draft Standard motion compensation is achieved by predictive coding, interpolative coding, and Variable Length Coded motion vectors. The information relative to motion is based on a 16.times.16 array of pixels and is transmitted with the spatial information. It is compressed with Variable Length Codes, such as Huffman codes.
The MPEG-2 Draft Standard provides temporal redundancy reduction through the use of various predictive and interpolative tools. This is illustrated in FIG. 1. FIG. 1 shows three types of frames or pictures, "I" Intrapictures, "P" Predicted Pictures, and "B" Bidirectional Interpolated Pictures.
The "I" Intrapictures provide moderate compression, and are access points for random access, e.g., in the case of video tapes or CD ROMS. As a matter of convenience, one "I" Intrapicture is provided approximately every half second. The "I" Intrapicture only gets information from itself. It does not receive information from an "P" Predicted Picture or "B" Bidirectional Interpolated Picture. Scene cuts preferably occur at "I" Intrapictures.
"P" Predicted Pictures are coded with respect to a previous picture. "P" Predicted Pictures are used as the reference for future pictures, both "P" and "B" pictures.
"B" Bidirectional Coded pictures have the highest degree of compression. They require both a past picture and a future picture for reconstruction. "B" bidirectional pictures are never used as a reference.
Motion compensation goes to the redundancy between pictures. The formation of "P" Predicted Pictures from "I" Intrapictures and of "B" Bidirectional Coded Pictures from a pair of past and future pictures is a key feature of the MPEG-2 Draft Standard technique.
The motion compensation unit under the MPEG-2 Draft Standard is the Macroblock unit. The MPEG-2 Draft Standard Macroblocks are 16.times.16 pixels. Motion information consists of one vector for forward predicted macroblocks, one vector for backward predicted macroblocks, and two vectors for bidirectionally predicted macroblocks. The motion information associated with each macroblock is coded differentially with respect to the motion information present in the reference macroblock. In this way a macroblock of pixels is predicted by a translation of a macroblock of pixels from a past or future picture.
The difference between the source pixels and the predicted pixels is included in the corresponding bit stream. The decoder adds a correction term to the block of predicted pixels to produce the reconstructed block.
As described above and illustrated in FIG. 1, each macroblock of a "P" Predicted Picture can be coded with respect to the closest previous "I" Intrapicture, or with respect to the closest previous "P" Predicted Picture.
Further, as described above and illustrated in FIG. 1, each macroblock of a "B" Bidirectional Picture can be coded by forward prediction from the closest past "I" or "P" Picture, by backward prediction from the closest future "I" or "P" Picture, or bidirectionally, using both the closest past "I" or "P" picture and the closest "future" "I" or "P" picture. Full bidirectional prediction is the least noisy prediction.
Motion information is sent with each macroblock to show what part of the reference picture is to be used as a predictor.
As noted above, motion vectors are coded differentially with respect to motion vectors of the previous adjacent block. Variable Length Coding is used to code the differential motion vector so that only a small number of bits are needed to code the motion vector in the common case, where the motion vector for a macroblock is nearly equal to the motion vector for a preceding macroblock.
Spatial redundancy is the redundancy within a picture. Because of the macroblock based nature of the motion compensation process, described above, it was desirable for the MPEG-2 Draft Standard to use a block based method of reducing spatial redundancy. The method of choice is the Discrete Cosine Transformation, and Discrete Cosine Transform coding of the picture. Discrete Cosine Transform coding is combined with weighted scalar quantization and run length coding to achieve still further levels of compression.
The Discrete Cosine Transformation is an orthogonal transformation. Orthogonal transformations, because they have a frequency domain interpretation, are filter bank oriented. The Discrete Cosine Transformation is also localized. That is, the encoding process samples on an 8x8 spatial window which is sufficient to compute 64 transform coefficients or sub-bands.
Another advantage of the Discrete Cosine Transformation is that fast encoding and decoding algorithms are available. Additionally, the sub-band decomposition of the Discrete Cosine Transformation is sufficiently well behaved to allow effective use of psychovisual criteria.
After transformation, many of the frequency coefficients are zero, especially the coefficients for high spatial frequencies. These coefficients are organized into a zig-zag, as shown in FIG. 2, and converted into run-amplitude (run-level) pairs. Each pair indicates the number of zero coefficients and the amplitude of the non-zero coefficient. This is coded in a Variable Length Code.
Discrete Cosine Transformation encoding is carried out in the three stages as shown in FIG. 2. The first stage is the computation of the Discrete Cosine Transformation coefficients. The second step is the quantization of the coefficients. The third step is the conversion of the quantized transform coefficients into {run-amplitude} pairs after reorganization of the data into zig-zag scanning order.
Quantization enables very high degrees of compression, and a high output bit rate, and retains high picture quality.
Quantization can be adaptive, with "I" Intrapictures having fine quantization to avoid "blockiness" in the reconstructed image. This is important because "I" Intrapictures contain energy at all frequencies. By way of contrast, "P" and "B" pictures contain predominantly high frequency energy and can be coded at a coarser quantization.
The MPEG-2 Draft Standard specifies a layered structure of syntax and bit stream. The bit stream is separated into logically distinct entities to prevent ambiguities and facilitate decoding. The six layers are shown in Table 1, below
TABLE 1 ______________________________________ MPEG-2 Draft Standard Layers Layer Purpose ______________________________________ Sequence Layer Random Access Unit and Context Group of Pictures Layer Random Access Unit and Video Coding Picture Layer Primary Coding Unit Slice Layer Resynchronization Unit Macroblock Layer Motion Compensation Unit Block Layer DCT Unit ______________________________________