MPEG Background
MPEG-2 and MPEG-4 are international video compression standards defining a video syntax that provides an efficient way to represent image sequences in the form of more compact coded data. The language of the coded bits is the “syntax.” For example, a few tokens can represent an entire block of samples (e.g., 64 samples for MPEG-2). Both MPEG standards also describe a decoding (reconstruction) process where the coded bits are mapped from the compact representation into an approximation of the original format of the image sequence. For example, a flag in the coded bitstream signals whether the following bits are to be preceded with a prediction algorithm prior to being decoded with a discrete cosine transform (DCT) algorithm. The algorithms comprising the decoding process are regulated by the semantics defined by these MPEG standards. This syntax can be applied to exploit common video characteristics such as spatial redundancy, temporal redundancy, uniform motion, spatial masking, etc. In effect, these MPEG standards define a programming language as well as a data format. An MPEG decoder must be able to parse and decode an incoming data stream, but so long as the data stream complies with the corresponding MPEG syntax, a wide variety of possible data structures and compression techniques can be used. It is also possible to carry the needed semantics within an alternative syntax.
These MPEG standards use a variety of compression methods, including intraframe and interframe methods. In most video scenes, the background remains relatively stable while action takes place in the foreground. The background may move, but a great deal of the scene is redundant. These MPEG standards start compression by creating a reference frame called an “Intra” frame or “I frame”. I frames are compressed without reference to other frames and thus contain an entire frame of video information. I frames provide entry points into a data bitstream for random access, but can only be moderately compressed. Typically, the data representing I frames is placed in the bitstream every 12 to 15 frames. Thereafter, since only a small portion of the frames that fall between the reference I frames are different from the bracketing I frames, only the image differences are captured, compressed, and stored. Two types of frames are used for such differences—Predicted or P frames, and Bi-directional Interpolated or B frames.
P frames generally are encoded with reference to a past frame (either an I frame or a previous P frame), and, in general, are used as a reference for subsequent P frames. P frames receive a fairly high amount of compression. B frames provide the highest amount of compression but require both a past and a future reference frame in order to be encoded. Bi-directional frames are never used for reference frames.
Macroblocks are regions of image pixels. For MPEG-2, a macroblock is a 16×16 pixel grouping of four 8×8 DCT blocks, together with one motion vector for P frames, and one or two motion vectors for B frames. Macroblocks within P frames may be individually encoded using either intra-frame or inter-frame (predicted) coding. Macroblocks within B frames may be individually encoded using intra-frame coding, forward predicted coding, backward predicted coding, or both forward and backward (i.e., bi-directionally interpolated) predicted coding.
After coding, an MPEG data bitstream comprises a sequence of I, P, and B frames. A sequence may consist of almost any pattern of I, P, and B frames (there are a few minor semantic restrictions on their placement). However, it is common in industrial practice to have a fixed pattern (e.g., IBBPBBPBBPBBPBB).
It has been known for some time that computation is reduced when determining motion vectors by utilizing a hierarchical motion search. For example, the MPEG algorithms attempt to find a match between “macroblock” regions. MPEG-type and other motion compensated DCT (discrete cosine transform) coders attempt to match each macroblock region in a current frame with a position in a previous frame (P frame) or previous and subsequent frame (B frame). However, it is not always necessary to find a good match, since MPEG can code a new macroblock as a fresh stand-alone (“intra”) macroblock in this case without using previous or subsequent frames. In such motion compensated DCT systems, one macroblock motion vector is needed for each macroblock region for MPEG-2. In MPEG-4, a set of 4 motion vectors, corresponding to one vector for each 8×8 region (i.e., 4 vectors per macroblock) is also an optional coding mode.
MPEG Precision
The reference MPEG-2 and MPEG-4 video codec implementations utilize the following encoding methodology:
a) When converting from RGB to YUV color space, only the number of bits that will be coded are kept (for example, MPEG-2 is limited to 8 bits in coding, and thus the YUV values are also limited to 8 bits).
b) When encoding and decoding, only the number of bits that have been coded are preserved, with careful rounding being applied to reduce artifacts.
c) When converting back to RGB, the precision is limited due to the limitations of the number of bits which were preserved (such as 8 bits maximum for MPEG-2).
FIG. 1 is a block diagram of a prior art MPEG-2 reference video encoding method. RGB input frames 102 coded in 8 bits/pixel per color are applied to an RGB-to-YUV converter 104, which is purposely limited to 8 bits of precision per color on its output. The result is applied to a DCT function 106, then to a quantizer function 108, then to an inverse DCT function 110, with the final output 212 being stored at the same precision as the input data.
MPEG-4′s reference video coder is implemented with the same method, although the intermediate precision can be extended up to 12 bits (although the VLC—variable length coding—tables do not support use of the full range).
Techniques for randomly dithering the limited precision values (8 bits per color component maximum in MPEG-2) are utilized to reduce the apparent visibility of step changes. However, noise and artifacts in coding are created due to this dither, and are also created due to the use of limited intermediate processing precision.
In addition to limited intermediate processing precision, MPEG-2 and MPEG-4 allow the inverse DCT (IDCT) algorithm used during encoding (often implemented in high precision floating point representation) to differ slightly from the IDCT algorithm used during decoding. This is known as “IDCT mismatch”. IDCT mismatch causes an unpredictable gradual drift in the signal away from the intended decoding values. This is conventionally reduced by use of random dither of the low order bit in the IDCT highest frequency (7th harmonic for the typical 8×8 DCT block size used in MPEG-2 and MPEG-4). Such dithering adds additional noise and artifacts to the signal.
FIG. 2 is a block diagram of a prior art MPEG-2 reference video decoding method. An encoded input bitstream 202 is applied to a dequantizer function 204 having a limited precision that matches the precision of the input bitstream (typically 8 bits for MPEG-2). The result is applied to an IDCT function 206 (which may not match the IDCT function 110 of the encoder), which output signed 8-bit values 208. This output comprises either an I frame 210, or is combined either with data from a previous frame 212 or a subsequent frame 214 (both at the same precision) to generate a new frame 216. Thus, the MPEG-2 decoding process limits intermediate processing precision to a maximum of 8 bits. Similarly, the intermediate processing precision for MPEG-4 video decoding is also limited to the number of bits used in encoding (a maximum of 12 bits, but often set to be 8 bits).
Limited precision in MPEG-2 and MPEG-4 also limits dynamic range (i.e., the number of levels of lighting that can be represented for an image) and contrast range (i.e., the number of distinct levels assigned to image regions of similar contrast). Accordingly, the encoding and decoding methods used in MPEG-2 and MPEG-4 reduce the potential quality of output, decompressed images compared to the original input images. The present invention addresses these limitations.