When a picture such as a frame of video or a still image is encoded, an encoder typically splits the visual data into blocks of sample values. The encoder performs a frequency transform such as a discrete coefficient transform (DCT) to convert the block of sample values into a block of transform coefficients. The transform coefficient by convention shown at the upper left of the block is generally referred to as the DC coefficient, and the other coefficients are generally referred to as the AC coefficients. For most blocks of sample values, a frequency transform tends to group non-zero values of the transform coefficients towards the upper-left, lower frequency section of the block of transform coefficients.
After the frequency transform, the encoder quantizes the transform coefficient values. The quantization generally reduces the number of possible values for the DC and AC coefficients. This usually reduces resolution as well as fidelity of the quantized values to the original coefficient values, but it makes subsequent entropy encoding more effective. The quantization also tends to “remove” the higher frequency coefficients (generally grouped in the lower right side of the block), when the higher frequency coefficients have low amplitudes that are quantized to zero.
FIG. 1 shows one example of an 8×8 block (100) of transform coefficients after quantization. In this exemplary block (100), the value 25 in the upper left corner of the block is the DC coefficient, and the other 63 values are the AC coefficients. Although the highest-amplitude coefficients in the block (100) are the low frequency coefficients in the upper left, along the right side, the block includes a cluster of non-zero coefficient values at higher frequencies.
After the transform coefficients have been quantized, the encoder entropy encodes the quantized transform coefficients. One common method of encoding a block of transform coefficients starts by reordering the block using a “zig-zag” scan order (200) as shown in FIG. 2. In this method, the encoder maps the values of the transform coefficients from a two-dimensional array into a one-dimensional string according to the scan order (200). The scan order (200) begins in the top left of the block (100) with the DC coefficient, traverses the AC coefficients of the block (100) at positions 1 and 2, traverses the AC coefficients at positions 3, 4, and 5, and so on. The scanning continues diagonally across the block (100) according to the scan order (200), finishing in the lower right corner of the block (100) with the highest frequency AC coefficient at position 63. Because the quantization operation typically quantizes to zero a significant portion of the lower-value, higher-frequency coefficients, while preserving non-zero values for the higher-value, lower-frequency coefficients, zigzag scan reordering commonly results in most of the remaining non-zero transform coefficients being near the beginning of the one-dimensional string and a large number of zero values being at the end of the string.
FIG. 2 shows an exemplary one-dimensional string (250) that results from applying the scan order (200) to the block (100) of transform coefficients. In this example, the one-dimensional string (250) starts with the value 25 corresponding to the DC coefficient of the block (100). The scan order then reads the value 12, followed by two values of 0, a value of −52, and so on. The symbol “EOB” signifies “End of Block” and indicates that all of the remaining values in the block are 0.
The encoder then entropy encodes the one-dimensional vector of coefficient values using run length coding or run level coding. In run level coding, the encoder traverses the one-dimensional vector, encoding each run of consecutive zero values as a run count, and encoding each non-zero value as a level. For simple encoding, the encoder assigns variable length codes such as Huffman codes to the run counts and level values.
One problem with the simple encoding is that the run count can vary from 0 to 64, requiring an alphabet of 65 codes just for the run count. If the encoder jointly encodes a run count with a subsequent non-zero level value (to take advantage of correlation between run count and level values), the size of the run count-level alphabet is much larger, which increases the complexity of the entropy encoding (e.g., due to code table sizes and lookup operations). Using escape codes for less frequent combinations helps control code table size but can decrease coding efficiency.
Another problem with run-level coding arises when the encoder uses the same possible code values for run-level combinations regardless of which AC coefficients are being encoded. If the chance of encountering a long run of zero values increases for higher frequency AC coefficients, using the same possible code values for run-level combinations hurts efficiency.
Finally, reordering using the zigzag scan order (200) shown in FIG. 2 can, in some cases, hurt encoding efficiency. In general, neighboring coefficient values within a block are correlated—if a transform coefficient value is zero, its neighbors are more likely to be zero, and if a transform coefficient value is non-zero, its neighbors are more likely to be non-zero. Reordering using the zigzag scan order (200) in some cases separates neighboring coefficient positions (e.g., positions 15 and 27) in the one-dimensional vector. For example, although the non-zero coefficients in the block (100) in FIG. 1 appear in two clusters, the non-zero coefficient values in the one-dimensional string (250) of FIG. 2 are interrupted 4 times by a series of one or more “0” values.
Given the critical importance of encoding and decoding to digital video, it is not surprising that video encoding and decoding are richly developed fields. Whatever the benefits of previous video encoding and decoding techniques, however, they do not have the advantages of the following techniques and tools.