MPEG Background
MPEG-2 and MPEG-4 are international video compression standards defining a video syntax that provides an efficient way to represent image sequences in the form of more compact coded data. The language of the coded bits is the “syntax.” For example, a few tokens can represent an entire block of samples (e.g., 64 samples for MPEG-2). Both MPEG standards also describe a decoding (reconstruction) process where the coded bits are mapped from the compact representation into an approximation of the original format of the image sequence. For example, a flag in the coded bitstream signals whether the following bits are to be preceded with a prediction algorithm prior to being decoded with a discrete cosine transform (DCT) algorithm. The algorithms comprising the decoding process are regulated by the semantics defined by these MPEG standards. This syntax can be applied to exploit common video characteristics such as spatial redundancy, temporal redundancy, uniform motion, spatial masking, etc. In effect, these MPEG standards define a programming language as well as a data format. An MPEG decoder must be able to parse and decode an incoming data stream, but so long as the data stream complies with the corresponding MPEG syntax, a wide variety of possible data structures and compression techniques can be used (although technically this deviates from the standard since the semantics are not conformant). It is also possible to carry the needed semantics within an alternative syntax.
These MPEG standards use a variety of compression methods, including intraframe and interframe methods. In most video scenes, the background remains relatively stable while action takes place in the foreground. The background may move, but a great deal of the scene is redundant. These MPEG standards start compression by creating a reference frame called an “intra” frame or “I frame”. I frames are compressed without reference to other frames and thus contain an entire frame of video information. I frames provide entry points into a data bitstream for random access, but can only be moderately compressed. Typically, the data representing I frames is placed in the bitstream every 12 to 15 frames (although it is also useful in some circumstances to use much wider spacing between I frames). Thereafter, since only a small portion of the frames that fall between the reference I frames are different from the bracketing I frames, only the image differences are captured, compressed, and stored. Two types of frames are used for such differences—Predicted or P frames, and Bi-directional Interpolated or B frames.
P frames generally are encoded with reference to a past frame (either an I frame or a previous P frame), and, in general, are used as a reference for subsequent P frames. P frames receive a fairly high amount of compression. B frames provide the highest amount of compression but require both a past and a future reference frame in order to be encoded. Bi-directional frames are never used for reference frames in standard compression technologies.
Macroblocks are regions of image pixels. For MPEG-2, a macroblock is a 16×16 pixel grouping of four 8×8 DCT blocks, together with one motion vector for P frames, and one or two motion vectors for B frames. Motion vectors describe the relative movement of a block of pixels between frames.
Macroblocks within P frames may be individually encoded using either intra-frame or inter-frame (predicted) coding modes. Macroblocks within B frames may be individually encoded using any of several coding modes: stand-alone intra-frame coding, forward predicted coding, backward predicted coding, or both forward and backward (i.e., bi-directionally interpolated) predicted coding. In addition to these coding modes, MPEG-4 also supports a second interpolative motion vector prediction mode: direct mode prediction using the motion vector from the subsequent P frame, plus a delta value.
After a coding mode decision is made, and the input video is coded accordingly, an MPEG data bitstream comprises a sequence of I, P, and B frames. A sequence may consist of almost any pattern of I, P, and B frames (there are a few minor semantic restrictions on their placement). However, it is common in practice to have a fixed pattern (e.g., IBBPBBPBBPBBPBB).
It is known to apply various biases to favor selection of one coding mode versus another mode. These biases are implemented statically (hardwired) in the reference MPEG-2 and MPEG-4 software encoders, generally as a positive or negative value added to a match measure, such as the sum of absolute differences (“SAD”). For example, there are biases to favor direct mode coding of B frames. There are also biases to favor or not favor intra macroblock coding mode decisions (in P frames of MPEG-4, and in P and/or B frames of MPEG-2).
For P frames, the mode decisions for MPEG-2 are between intra (stand alone) coding and forward-predicted coding with a motion vector. In MPEG-4, an additional choice is provided to allow the 16×16 macroblock to be split into four 8×8 blocks, corresponding to the four 8×8 DCT blocks, with each having a motion vector. Again, hardwired biases typically are applied to these mode decisions.
There are other biases that are also of relevance. For example, typically there is a bias toward favoring a zero motion vector. Since a zero motion vector will usually code more compactly than a non-zero vector, if the zero vector's match is only slightly inferior to the best non-zero vector match (using, for example, the Sum of Absolute Difference—SAD—algorithm in MPEG), then the bias causes the zero vector to be selected.
In the MPEG-2 and MPEG-4 software reference encoders, all of these mode decision biases are statically set relative to 8-bit significance in coding. In general, a bias is set at approximately one least significant bit (usually multiplied by the macroblock area) within the 8-bits available for coding (i.e., 1/256th of the maximum white value). Note also that all mode decisions in MPEG-2 and MPEG-4 are based upon luminance (Y channel) values only.
In MPEG encoding, the intra versus predicted macroblock decision is made based upon self-relative energy of the intra coding versus the minimum SAD from predicted (inter) coding. This decision attempts to minimize coded bits by estimating the coefficient energy (without reference to the quantization parameter, QP) of the intra versus the difference macroblock coding. This is done without applying the DCT transform in the MPEG-2 and MPEG-4 reference encoding software, and without applying the actual QP value. Rather, a simpler self-energy measure is determined by comparing using the actual macroblock pixels and the difference pixels, and selecting which ever is smaller after adding the static bias value. Again, the bias toward intra coding is set statically (hard-wired) in the MPEG-2 and MPEG-4 reference encoders, and is based upon static assumptions of precision (i.e., 8-bit) and coding overheads.