Digital video consumes large amounts of storage and transmission capacity. A typical raw digital video sequence includes 15 or 30 frames per second. Each frame can include tens or hundreds of thousands of pixels (also called pels), where each pixel represents a tiny element of the picture. In raw form, a computer commonly represents a pixel as a set of three samples totaling 24 bits. For instance, a pixel may include an 8-bit luminance sample (also called a luma sample, as the terms “luminance” and “luma” are used interchangeably herein) that defines the grayscale component of the pixel and two 8-bit chrominance samples (also called chroma samples, as the terms “chrominance” and “chroma” are used interchangeably herein) that define the color component of the pixel. Thus, the number of bits per second, or bit rate, of a typical raw digital video sequence may be 5 million bits per second or more.
Many computers and computer networks lack the resources to process raw digital video. For this reason, engineers use compression (also called coding or encoding) to reduce the bit rate of digital video. Compression decreases the cost of storing and transmitting video by converting the video into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original video from the compressed form. A “codec” is an encoder/decoder system. Compression can be lossless, in which the quality of the video does not suffer, but decreases in bit rate are limited by the inherent amount of variability (sometimes called entropy) of the video data. Or, compression can be lossy, in which the quality of the video suffers, but achievable decreases in bit rate are more dramatic. Lossy compression is often used in conjunction with lossless compression—the lossy compression establishes an approximation of information, and the lossless compression is applied to represent the approximation.
In general, video compression techniques include “intra-picture” compression and “inter-picture” compression, where a picture is, for example, a progressively scanned video frame, an interlaced video frame (having alternating lines for two video fields), or a single interlaced video field from an interlaced video frame. For progressive frames, intra-picture compression techniques compress individual frames (typically called I-frames or key frames), and inter-picture compression techniques compress frames (typically called predicted frames, P-frames, or B-frames) with reference to a preceding and/or following frame (typically called a reference or anchor frame) or frames (for B-frames).
I. Block Coding/Decoding in Windows Media Video, Version 9
Microsoft Corporation's Windows Media Video, Version 9 [“WMV9”] includes a video encoder and a video decoder. The encoder uses intra and inter compression, and the decoder uses intra and inter decompression. The intra and inter compression are block based. The intra compression uses a block-based frequency transform on blocks of samples. The inter compression uses block-based motion compensated prediction coding followed by transform coding of the residual error.
A. Block-based Intra Compression
FIG. 1 illustrates block-based intra compression in the encoder. In particular, FIG. 1 illustrates compression of an 8×8 block (105) of samples of an intra frame by the encoder. The encoder splits the frame into 8×8 blocks of samples and applies an 8×8 frequency transform (110) to individual blocks such as the block (105). The encoder quantizes (120) the transform coefficients (115), resulting in an 8×8 block of quantized transform coefficients (125).
Further encoding varies depending on whether a coefficient is a DC coefficient (the top left coefficient), an AC coefficient in the top row or left column, or another AC coefficient. The encoder typically encodes the DC coefficient (126) as a differential from the DC coefficient (136) of a neighboring 8×8 block, which is a previously encoded and decoded/reconstructed top or left neighbor block. The encoder entropy encodes (140) the differential.
The entropy encoder can encode the left column or top row of AC coefficients as differentials from AC coefficients a corresponding left column or top row of a neighboring 8×8 block. FIG. 1 shows the left column (127) of AC coefficients encoded as differentials (147) from the left column (137) of the neighboring (actually situated to the left) block (135).
The encoder scans (150) the 8×8 block (145) of predicted, quantized AC coefficients into a one-dimensional array (155) and then entropy encodes the scanned coefficients using a variation of run/level coding (160). The encoder selects variable length codes [“VLCs”] from run/level/last tables (165) and outputs the VLCs.
B. Block-based Intra Decompression
FIG. 2 shows an example of corresponding decoding (200) for an intra-coded block by the decoder. In particular, FIG. 2 illustrates decompression of an 8×8 block of samples of an intra frame by the decoder to produce a reconstructed version (205) of the original 8×8 block (105).
The decoder receives and decodes (270) VLCs with run/level/last tables (265). The decoder run/level decodes (260) AC coefficients and puts the results into a one-dimensional array (255), from which the AC coefficients are inverse zigzag scanned (250) into a two-dimensional block (245).
The AC coefficients of the left column or top row of the block (245) may be differentials, in which case the decoder combines them with corresponding AC coefficients from a neighboring 8×8 block. In FIG. 2, the left column (247) of AC coefficients are differentials, and they are combined with AC coefficients of the left column (237) of a neighboring (actually situated to the left) block (235) to produce a left column (227) of AC coefficients in a block (225) of quantized transform coefficients.
To decode the DC coefficient (226), the decoder decodes (240) a DC differential. The decoder combines the DC differential with a DC coefficient (236) of a neighboring 8×8 block to produce the DC coefficient (226) of the block (225) of quantized transform coefficients.
The decoder inverse quantizes (220) the quantized transform coefficients of the block (225), resulting in a block (215) of transform coefficients. The decoder applies an inverse frequency transform (210) to the block (215) of transform coefficients, producing the reconstructed version (205) of the original 8×8 block (105).
C. Escape Mode Coding and Decoding for Intra-coded Blocks
When the encoder selects and outputs a VLC for a given run/level/last triplet from a run/level/last table (165), the VLC may be an escape code. If so, one or more additional codes follow in the bitstream to provide information about the triplet. There are three alternative escape modes.
In the first escape mode, an additional VLC in the bitstream represents the run/level/last triplet. A level value derived from the additional VLC represents an initial level value. A run value derived from the additional VLC represents a run, but is also used as an index in a table to determine an extra amount to be added to the initial level value.
Similarly, in the second escape mode, an additional VLC in the bitstream represents the run/level/last triplet. A run value derived from the additional VLC represents an initial run value. A level value derived from the additional VLC represents a level, but is also used as an index in a table to determine an extra amount to be added to the initial run value.
In the third escape mode, the last value is signaled as a single bit. For the first use of the third escape mode in the current frame, the encoder signals (with a fixed length code [“FLC”]) a size value for third mode-coded run values and signals (with a VLC) another size value for third mode-coded level values for the current frame. The size elements are followed by a run code (having the signaled run code size) and a level code (having the signaled level code size). A sign value for the level is also signaled with one bit. For subsequent uses of the third escape mode in the current frame, the previously signaled size values for the current frame apply, and new size values are not signaled. Instead, a run code (having the previously signaled run code size), sign bit, and level code (having the previously signaled level code size) are signaled.
When the decoder receives and decodes (270) VLCs with run/level/last tables (265), some VLCs are directly represented with run/level/last triplets in the tables (265). Other VLCs are not, and the decoder as necessary performs the reverse of the escape mode coding to decode the AC coefficients.
The resizing of codes for runs and levels in the third escape mode provides adaptivity to patterns of run and levels in a given frame. For example, when there are no long runs, shorter codes for escape-coded runs may be used. And when there are no high levels, shorter codes for escape-coded levels may be used. In some scenarios, however, adaptivity at frame level is inadequate. For example, suppose a scene transition occurs between two fields of a single interlaced video frame, and that one field of the frame has long runs and small levels of coefficients, while the other field of the frame has short runs and high levels of coefficients. Setting escape code sizes for the whole frame can lead to inefficiencies in the coding of the small levels and short runs. Or, suppose a single progressive frame includes multiple, very different types of content, such as a main area of dynamic video, a static border area, and a scrolling text display. Setting escape code sizes for the whole frame can again lead to inefficiencies in escape coding certain areas of the frame.
D. Block-based Inter Compression
FIG. 3 illustrates the block-based inter compression for a predicted frame in the encoder. In particular, FIG. 3 illustrates compression of a prediction residual block (335) for a motion-compensated predicted block of a predicted frame in the encoder. The error block (335) is the difference between the predicted block (315) and the original current block (325). The encoder applies a frequency transform (340) to the error block (335), resulting in an 8×8 block (345) of transform coefficients. The encoder then quantizes (350) the transform coefficients, resulting in an 8×8 block of quantized transform coefficients (355). The encoder scans (360) the 8×8 block (355) into a one-dimensional array (365). The encoder entropy encodes the scanned DC and AC coefficients using a variation of run length coding (370). The encoder selects VLCs from a run/level/last table (375) and outputs the VLCs.
E. Block-based Inter Decompression
FIG. 4 shows an example of corresponding decoding (400) for an inter-coded block. In summary of FIG. 4, a decoder decodes (410, 420) entropy-coded information representing a prediction residual using variable length decoding (410) with a run/level/last table (415) and run length decoding (420). The decoder inverse scans (430) a one-dimensional array (425) storing the entropy-decoded information into a two-dimensional block (435). The decoder inverse quantizes and inverse frequency transforms (together, 440) the data, resulting in a reconstructed error block (445). In a separate motion compensation path, the decoder computes a predicted block (465) using motion vector information (455) for displacement from a reference frame. The decoder combines (470) the predicted block (465) with the reconstructed error block (445) to form the reconstructed block (475).
F. Escape Mode Coding and Decoding for Inter-coded Blocks
For an inter-coded block, when the encoder selects and outputs a VLC for a given run/level/last triplet from a run/level/last table (375), the VLC may be an escape code. If so, one or more additional codes follow in the bitstream to provide information about the triplet. There are three alternative escape modes, which generally correspond to the three escape modes described above for intra-coded blocks.
Similarly, when the decoder receives and decodes (410) VLCs with a run/level/last table (415), some VLCs are directly represented with run/level/last triplets in the table (415). Other VLCs are not, and the decoder as necessary performs the reverse of the escape mode coding to decode the DC and AC coefficients.
II. Standards for Video Compression and Decompression
Aside from previous WMV encoders and decoders, several international standards relate to video compression and decompression. These standards include the Motion Picture Experts Group [“MPEG”] 1, 2, and 4 standards and the H.261, H.262 (another name for MPEG 2), H.263, and H.264 standards from the International Telecommunication Union [“ITU”].
An encoder and decoder complying with one of these standards typically use some variation of run/level coding/decoding. To the extent escape mode coding/decoding is used for runs and levels, the sizes of the codes following the escape VLC are static in most cases. In other words, one size is defined for all time for escape-coded runs, and another size is defined for all time for escape-coded levels. Where code size variation is possible, the different sizes are for different ranges of level values in a single VLC-like code table (MPEG-1, Table B.5f, section D.6.3.5.), or the different sizes are for compatibility purposes with other standards (MPEG-4, section 6.3.4, 7.3.1.3). There is no resizing of escape mode codes for runs and levels so as to adapt to different patterns of run and levels.
Given the critical importance of video compression and decompression to digital video, it is not surprising that video compression and decompression are richly developed fields. Whatever the benefits of previous video compression and decompression techniques, however, they do not have the advantages of the following techniques and tools.