Digital video consumes large amounts of storage and transmission capacity. A typical raw digital video sequence includes 15 or 30 frames per second. Each frame can include tens or hundreds of thousands of pixels (also called pels). Each pixel represents a tiny element of the picture. In raw form, a computer commonly represents a pixel with 24 bits. Thus, the number of bits per second, or bit rate, of a typical raw digital video sequence can be 5 million bits/second or more.
Most computers and computer networks lack the resources to process raw digital video. For this reason, engineers use compression (also called coding or encoding) to reduce the bit rate of digital video. Compression can be lossless, in which quality of the video does not suffer but decreases in bit rate are limited by the complexity of the video. Or, compression can be lossy, in which quality of the video suffers but decreases in bit rate are more dramatic. Decompression reverses compression.
In general, video compression techniques include intraframe compression and interframe compression. Intraframe compression techniques compress individual frames, typically called I-frames or key frames. Interframe compression techniques compress frames with reference to preceding and/or following frames, which are typically called predicted frames, P-frames, or B-frames.
Microsoft Corporation's Windows Media Video, Version 8 [“WMV8”] includes a video encoder and a video decoder. The WMV8 encoder uses intraframe and interframe compression, and the WMV8 decoder uses intraframe and interframe decompression.
A. Intraframe Compression in WMV8
FIG. 1 shows an example of block-based intraframe compression (100) of a block (105) of pixels in a key frame in the WMV8 encoder. For example, the WMV8 encoder splits a key video frame into 8×8 blocks of pixels and applies an 8×8 Discrete Cosine Transform [“DCT”] (110) to individual blocks, converting the 8×8 block of pixels (105) into an 8×8 block of DCT coefficients (115). The encoder quantizes (120) the DCT coefficients, resulting in an 8×8 block of quantized DCT coefficients (125) which the encoder then prepares for entropy encoding.
The encoder encodes the DC coefficient (126) as a differential from the DC coefficient (136) of a previously encoded neighbor (e.g., neighbor block (135)) of the block being encoded. The encoder entropy encodes the differential (140). FIG. 1 shows the left column (127) of AC coefficients encoded as a differential (147) from the left column (137) of the neighboring (to the left) block (135). The remaining AC coefficients are from the block (125) of quantized DCT coefficients.
The encoder scans (150) the 8×8 block (145) of predicted, quantized AC DCT coefficients into a one-dimensional array (155) and then entropy encodes the scanned AC coefficients using a variation of run length coding (160). The encoder selects an entropy code from one or more run/level/last tables (165) and outputs the entropy code (170).
B. Interframe Compression in WMV8
Interframe compression in the WMV8 encoder uses block-based motion compensated prediction coding followed by transform coding of the residual error. FIGS. 2 and 3 illustrate the block-based interframe compression for a predicted frame in the WMV8 encoder. In particular, FIG. 2 illustrates motion estimation for a predicted frame (210) and FIG. 3 illustrates compression of a prediction residual for a motion-estimated block of a predicted frame.
For example, the WMV8 encoder splits a predicted frame into 8×8 blocks of pixels. Groups of four 8×8 blocks form macroblocks. For each macroblock, a motion estimation process is performed. The motion estimation approximates the motion of the macroblock of pixels relative to a reference frame, for example, a previously coded, preceding frame. In FIG. 2, the WMV8 encoder computes a motion vector for a macroblock (215) in the predicted frame (210). To compute the motion vector, the encoder searches in a search area (235) of a reference frame (230). Within the search area (235), the encoder compares the macroblock (215) from the predicted frame (210) to various candidate macroblocks in order to find a candidate macroblock that is a good match. After the encoder finds a good matching macroblock, the encoder outputs information specifying the motion vector (entropy coded) for the matching macroblock so the decoder can find the matching macroblock during decoding. When decoding the predicted frame (210) with motion compensation, a decoder uses the motion vector to compute a prediction macroblock for the macroblock (215) using information from the reference frame (230). The prediction for the macroblock (215) is rarely perfect, so the encoder usually encodes 8×8 blocks of pixel differences (also called the error or residual blocks) between the prediction macroblock and the macroblock (215) itself.
FIG. 3 illustrates an example of computation and encoding of an error block (335) in the WMV8 encoder. The error block (335) is the difference between the predicted block (315) and the original current block (325). The encoder applies a DCT (340) to the error block (335), resulting in an 8×8 block (345) of coefficients. The encoder then quantizes (350) the DCT coefficients, resulting in an 8×8 block of quantized DCT coefficients (355). The quantization step size is adjustable. Quantization results in loss of precision, but not complete loss of the information for the coefficients.
The encoder then prepares the 8×8 block (355) of quantized DCT coefficients for entropy encoding. The encoder scans (360) the 8×8 block (355) into a one dimensional array (365) with 64 elements, such that coefficients are generally ordered from lowest frequency to highest frequency, which typically creates long runs of zero values.
The encoder entropy encodes the scanned coefficients using a variation of run length coding (370). The encoder selects an entropy code from one or more run/level/last tables (375) and outputs the entropy code.
FIG. 4 shows an example of a corresponding decoding process (400) for an inter-coded block. Due to the quantization of the DCT coefficients, the reconstructed block (475) is not identical to the corresponding original block. The compression is lossy.
In summary of FIG. 4, a decoder decodes (410, 420) entropy-coded information representing a prediction residual using variable length decoding (410) with one or more run/level/last tables (415) and run length decoding (420). The decoder inverse scans (430) a one-dimensional array (425) storing the entropy-decoded information into a two-dimensional block (435). The decoder inverse quantizes and inverse discrete cosine transforms (together, 440) the data, resulting in a reconstructed error block (445). In a separate motion compensation path, the decoder computes a predicted block (465) using motion vector information (455) for displacement from a reference frame. The decoder combines (470) the predicted block (465) with the reconstructed error block (445) to form the reconstructed block (475).
The amount of change between the original and reconstructed frame is termed the distortion and the number of bits required to code the frame is termed the rate for the frame. The amount of distortion is roughly inversely proportional to the rate. In other words, coding a frame with fewer bits (greater compression) will result in greater distortion, and vice versa.
C. Limitations of Conventional Motion-based Video Compression
Video sequences with effects such as fading, morphing, and blending require relatively large amounts of bits to encode because conventional motion-based video compression methods are generally not effective on such frames. For example, consider a video sequence in which an object in a frame has moved slightly in one direction from one frame to the next. In a typical block-matching motion estimation technique, it may be a simple matter in a video sequence without fading to find a good match in the previous frame for a block in the current frame and encode the resulting motion vector. However, if, for example, a “fade-to-black” is occurring in the video sequence, every luminance value in the current frame may have changed relative to the previous frame, preventing the video encoder from finding a good match for the block. Fading also may occur in a sequence due to natural illumination changes. Blending and morphing, which are other transitioning effects, may also reduce the effectiveness of straightforward motion estimation/compensation.
D. Standards for Video Compression and Decompression
Aside from WMV8, several international standards relate to video compression and decompression. These standards include the Motion Picture Experts Group [“MPEG”] 1, 2, and 4 standards and the H.261, H.262, and H.263 standards from the International Telecommunication Union [“ITU”]. Like WMV8, these standards use a combination of intraframe and interframe compression, although the standards typically differ from WMV8 in the details of the compression techniques used. For example, Annex P of the H.263 standard describes a Reference Picture Resampling mode for use in prediction that can be used to adaptively alter the resolution of pictures during encoding.
Given the critical importance of video compression and decompression to digital video, it is not surprising that video compression and decompression are richly developed fields. Whatever the benefits of previous video compression and decompression techniques, however, they do not have the advantages of the following techniques and tools.