With the introduction of digital video disks and video delivery over the Internet, digital video has become commonplace. Engineers use a variety of techniques to process digital video efficiently while still maintaining the quality of the digital video. To understand these techniques, it helps to understand how video information is represented and processed in a computer.
I. Representation of Video Information in a Computer
A computer processes video information as a series of numbers representing that information. A single number typically represents an intensity value for one picture element [“pixel”] of a picture. Several factors affect the quality of the video information, including sample depth, resolution, and frame rate.
Sample depth (or precision) indicates the range of numbers used to represent a sample. When more values are possible for the sample, quality is higher because the number can capture more subtle variations in intensity. Video with higher resolution tends to look crisper than other video, and video with higher frame rate tends to look smoother than other video. For all of these factors, the tradeoff for high quality is the cost of storing and transmitting the information, as Table 1 shows.
TABLE 1Bitrates for different quality levels of raw videoDescriptionBits Per PixelResolutionFrame RateBitrateLow-resolu-8 (value 0-160 × 1207.5 1.2 Mbit/stion, gray scale255)pixelsvideo monitor-ingInternet stream-24 (value 0-240 × 1761515.2 Mbit/sing 16,777,215)pixelsVideoconfer-24 (value 0-352 × 2883073 Mbit/sencing16,777,215)pixels
Table 1: Bitrates for Different Quality Levels of Raw Video
Very high quality formats such as those used for HDTV use even more bitrate. Despite the high bitrate, companies and consumers increasingly depend on computers to create, distribute, and play back high quality content. For this reason, engineers use compression (also called coding or encoding) to reduce the bitrate of digital video. Compression decreases the cost of storing and transmitting the information by converting the information into a lower bitrate form. Decompression (also called decoding) reconstructs a version of the original information from the compressed form. Two categories of compression are lossless compression and lossy compression.
Lossless compression reduces the bitrate of information by removing redundancy from the information. For example, a series of ten red pixels is represented as a code for “red” and the number ten in compression, and the series is perfectly reconstructed in decompression. Lossless compression techniques reduce bitrate at no cost to quality, but can only reduce bitrate up to a certain point. Decreases in bitrate are limited by the complexity of the video. Entropy coding is another term for lossless compression.
In contrast, with lossy compression, the quality of the video suffers but decreases in bitrate are more dramatic. For example, a series of ten pixels, each a slightly different shade of red, is approximated as ten pixels with the same “red” color. Lossy compression techniques can reduce bitrate more by removing more quality, but the lost quality cannot be restored. Lossy compression is typically used in conjunction with lossless compression—the lossy compression reduces the complexity and quality of the video, which enables greater bitrate reduction in subsequent lossless compression. For example, the series of ten pixels, each a slightly different shade of red, is represented as a code for “red” and the number 10 in compression. In decompression, the original series is reconstructed as ten pixels with the same “red” color.
II. Entropy Encoding and Decoding
Entropy encoding and decoding have been an active area of research for over 50 years. A variety of entropy encoding and decoding techniques have been developed, including run length coding/decoding, Huffman coding/decoding, and arithmetic coding/decoding. This section surveys various entropy encoding and decoding techniques.
A. Run Length and Run Level Encoding/Decoding
Run length encoding is a simple compression technique used for camera video, images, and other types of content. In general, run length encoding replaces a series (i.e., run) of consecutive symbols having the same value with the value and the length of the series. For example, the sequence 3 3 0 0 0 1 0 0 0 0 is represented as value 3, length 2, value 0, length 3, value 1, length 1, and value 0, length 4. Run length encoding is particularly effective for sequences having bursts of the same values. In run length decoding, the sequence is reconstructed from the run values and run lengths. Numerous variations of run length encoding/decoding have been developed. For additional information about run length encoding/decoding and some of its variations, see, e.g., Bell et al., Text Compression, Prentice Hall PTR, pages 105-107, 1990; Gibson et al., Digital Compression for Multimedia, Morgan Kaufmann, pages 17-62, 1998; U.S. Pat. No. 6,304,928 to Mairs et al.; U.S. Pat. No. 5,883,633 to Gill et al; and U.S. Pat. No. 6,233,017 to Chaddha.
Run level encoding is similar to run length encoding in that runs of consecutive symbols of one value (typically, the predominant value) are replaced with lengths. Unlike run length coding, however, other values are not represented with lengths. Instead, each run level pair represents a run of predominant values and a single non-predominant value. For example, the sequence 3 3 0 0 0 1 0 0 0 0 0 0 0 0 0 1 is represented as length 0, level 3, length 0, level 3, length 3, level 1, length 9, level 1. Run level encoding is particularly effective for sequences in which a single value predominates, with interspersed less common values.
B. Huffman Coding and Decoding Huffman coding is another well-known compression technique used for camera video, images, and other types of content. In general, a Huffman code table associates variable-length Huffman codes with unique symbol values (or unique combinations of symbol values). Shorter codes are assigned to more probable values, and longer codes are assigned to less probable values. For example, suppose the data is a series of 8-bit samples, where 50% of the samples have a value of zero, 25% of the samples have a value of one, and the remaining samples have values in the range of 2 to 255. Rather than represent each sample with 8 bits, the encoder uses a 1-bit code “0” for the value 0, a 2-bit code “10” for the value 1, and longer codes starting with “11” for other values. The least likely values may take more than 8 bits to represent, but the average bitrate is reduced due to the efficient coding of the most common values.
To encode a symbol, the Huffman encoder replaces the symbol value with the variable-length Huffman code associated with the symbol value in the Huffman code table. To decode, the Huffman decoder replaces the Huffman code with the symbol value associated with the Huffman code.
In scalar Huffman coding, a Huffman code table associates a single Huffman code with one value. Scalar Huffman coding is relatively simple to implement, but is inefficient when a single value predominates. For example, if 70% of samples have values of 0, the ideal Huffman code would be less than 1 bit long, but the shortest possible scalar Huffman code is 1 bit long.
In vector Huffman coding, a Huffman code table associates a single Huffman code with a combination or set of values. Vector Huffman coding can lead to better bitrate reduction than scalar Huffman encoding (e.g., by allowing the encoder to exploit symbol probabilities fractionally in binary Huffman codes). On the other hand, the codebook for vector Huffman encoding can be extremely large when single codes represent large groups of symbols or symbols have a large ranges of potential values (due to the large number of potential combinations). For example, if the alphabet size is 256 (for values 0 to 255 per symbol) and the number of symbols per set to be encoded is 4, the number of potential combinations is 2564=4,294,967,296. This consumes memory and processing resources in computing the codebook codes and performing look up operations in coding and decoding.
In static Huffman coding, probabilities are set based upon probabilities expected for a certain kind of content. Alternatively, in adaptive Huffman coding, probabilities are computed for information just encoded or information to be encoded, in which case the Huffman codes adapt to changing probabilities for symbol values. Compared to static Huffman coding, adaptive Huffman coding usually reduces bitrate by incorporating more accurate probabilities, but the encoder and decoder must perform extra processing to track probabilities and maintain consistent code table states.
The results of run length encoding and run level encoding (e.g., levels and runs) can be Huffman encoded to further reduce bitrate. For example, the most common level is represented with a short Huffman code, and less common levels are represented with longer Huffman codes. Run lengths are represented with different Huffman codes. One problem with separate encoding of levels and lengths is that the most common level (e.g., 1 for run level encoding) and length (e.g., 0 for run level encoding) typically have probability>50%, which makes scalar Huffman coding inefficient. To address this concern, a single Huffman code may jointly represent the pair of a particular level value and a particular length value. With this system, the most likely level/length pair typically has a probability<50%. On the other hand, the Huffman code table needed to represent the various possible level/length pairs is very large.
Numerous variations of Huffman coding/decoding have been developed. For additional information about Huffman coding/decoding and some of its variations, see, e.g., Bell et al., Text Compression, Prentice Hall PTR, pages 105-107, 1990; Gibson et al., Digital Compression for Multimedia, Morgan Kaufmann, pages 17-62, 1998. U.S. Pat. No. 6,223,162 to Chen et al. describes multi-level run length coding of audio, U.S. Pat. No. 6,377,930 to Chen et al. describes variable-to-variable length encoding of audio, and U.S. Pat. No. 6,300,888 to Chen et al. describes entropy code mode switching for frequency domain audio coding.
C. Arithmetic Coding and Decoding
Arithmetic coding is another compression technique used for camera video, images, and other types of content. Like vector Huffman coding, arithmetic coding is often used in applications where the optimal number of bits to encode a given input symbol is a fractional number of bits, or in cases where a correlation between input symbols exists.
Arithmetic coding generally involves representing an input sequence as a single number within a given range. Typically, the number is a fractional number between 0 and 1. Symbols in the input sequence are associated with ranges occupying portions of the space between 0 and 1. The ranges are calculated based on the probability of the particular symbols occurring in the input sequence, and the fractional number used to represent the input sequence is constructed with reference to the ranges. Therefore, probability distributions for input symbols are important in arithmetic coding schemes. In fact, it is the preparation and updating of these probability distributions that makes arithmetic encoding and decoding undesirable in many contexts. The encoder and decoder must maintain consistent probability distribution states for correct performance, which can be burdensome depending on the number of different symbols and distributions tracked and the complexity of the tracking.
In context-based arithmetic coding, different probability distributions for the input symbols are further associated with different contexts. A context is a state that is reproducible by the encoder and decoder, typically based on previously decoded information, which provides guidance as to the probability distribution of an element in subsequent encoding or decoding. The probability distribution used to encode the input sequence changes when the context changes. The context can be calculated based upon different factors that are expected to affect the probability of a particular input symbol appearing in an input sequence. While context-based arithmetic coding can further improve the efficiency of arithmetic coding, the cost is additional computational overhead for maintaining and updating more states. For additional information about arithmetic coding/decoding and some of its variations, see Nelson, The Data Compression Book, “Huffman One Better: Arithmetic Coding,” Chapter 5, pp. 123-65 (1992).
III. Implementations of Entropy Encoding and Decoding for Video
Numerous international standards specify different aspects of video encoders, decoders, and/or formats for compressed information. These standards include the H.261, MPEG-1, H.262, H.263, MPEG-4, and H.264/AVC standards. While the details of these standards vary, each uses a combination of lossy and lossless compression as well as a block-based frequency transform. With the transform, a block of pixels or other spatial domain information is converted to a block of frequency transform coefficients, which are a more efficient representation of the information. The frequency transform coefficients are then lossy encoded, zigzag scanned into a one-dimensional sequence, and losslessly encoded. The lossless compression of frequency transform coefficients typically uses some combination of run level encoding, Huffman coding, and/or arithmetic coding, but some other entropy coding techniques are specified as well. For additional details about the standards and, in particular, the myriad forms of entropy coding and decoding used in the standards, see the standards documents themselves.
The H.264 standard (sometimes called the AVC, JVT, or MPEG-4, layer 10 standard) defines two context-adaptive entropy coding methods, based on Huffman-like variable length coding and arithmetic coding, respectively. The variable length coding method is simple and not particularly resource intensive, but is also not very good in terms of compression efficiency. On the other hand, the arithmetic coding method is good in terms of compression efficiency, but is resource intensive and slow due to the state tracking that the encoder and decoder must perform.
Aside from the international standards listed above, numerous companies have produced video encoders and decoders. Microsoft Corporation has produced several versions of Windows Media Video [“WMV”]. The various versions of WMV also use different combinations of lossy and lossless compression. In WMV7, the encoder and decoder use run/level/last code tables, in which an entropy code jointly represents a run of zero-value coefficients, a non-zero level, and whether the coefficient is the last in the block. WMV7 also includes escape coding for values outside of the run/level/last tables. In WMV8 and some versions of WMV9, the encoder and decoder use variants of run/level/last coding for frequency transform coefficients, with table selection based on contextual information, and also use other entropy encoding and decoding. U.S. Patent Application Publication Nos. 2003-0138150-A1 and 2003-0156648-A1 include description of entropy encoding and decoding in WMV8.
Whatever the advantages of prior techniques and systems for lossless compression, they do not have the advantages of the present invention.