When a picture such as a frame of video or a still image is encoded, an encoder typically splits the visual data into blocks of sample values. The encoder performs a frequency transform such as a discrete coefficient transform (“DCT”) to convert the block of sample values (or motion-compensated prediction residual values) into a block of transform coefficients. The transform coefficient by convention shown at the upper left of the block is generally referred to as the DC coefficient, and the other coefficients are generally referred to as the AC coefficients. For most blocks of values, a frequency transform tends to group non-zero values of the transform coefficients towards the upper-left, lower frequency section of the block of transform coefficients.
After the frequency transform, the encoder quantizes the transform coefficient values. The quantization reduces the number of possible values for the DC and AC coefficients. This usually reduces fidelity of the quantized values to the original coefficient values, but it makes subsequent entropy encoding more effective. Quantization also tends to “remove” higher frequency coefficients (by convention shown the lower right side of the block), when the higher frequency coefficients have low levels that are quantized to zero.
After the transform coefficients have been quantized, the encoder entropy encodes the quantized transform coefficients. One common method of encoding a block of quantized transform coefficients starts by reordering the block using a zigzag scan order. The encoder maps the values of the transform coefficients from a two-dimensional array into a one-dimensional string according to the scan order. For example, the scan order begins in the top left of the block with the DC coefficient, traverses the lowest frequency AC coefficients of the block, and continues scanning along diagonal lines according to the scan order, finishing in the lower right corner of the block with the highest frequency AC coefficient. Quantization typically yields zero-value coefficients for a significant portion of the lower-value, higher-frequency coefficients, while preserving non-zero values for the higher-value, lower-frequency coefficients. Thus, zigzag scan reordering commonly results in most of the remaining non-zero transform coefficients being near the beginning of the one-dimensional string and a large number of zero values being at the end of the string.
The encoder then entropy encodes the one-dimensional string of coefficient values using run length coding or run level coding. In run level coding, the encoder traverses the one-dimensional string, encoding each run of consecutive zero values as a run count, and encoding each non-zero value as a level. The encoder can then assign variable length codes (“VLCs”) to the run counts and level values.
In a simple variable length encoding scheme for the results of run-level coding, the encoder assigns a VLC to each run count and assigns a VLC to each level value. One problem with such simple variable length coding is that it fails to exploit correlation between run count values and level values. In many encoding scenarios, certain level values are correlated with certain run count values, and exploiting such correlation could lead to more efficient entropy coding.
In an example joint encoding scheme for the results of run-level coding, the encoder assigns a VLC to a run count and a subsequent non-zero level value. Although assigning VLCs to run count/level value pairs can help exploit correlation between run count and level values, the size of the run count/level value pair alphabet can be very large, which increases the complexity of the entropy encoding (e.g., due to codebook sizes and lookup operations). Using escape codes for less frequent run count/level value combinations helps control codebook size but can decrease coding efficiency.
Another problem with run-level coding arises when the encoder uses the same possible code values for run-level combinations regardless of which AC coefficients are being encoded. For example, if the likelihood of encountering a long run of zero values changes for different frequencies of AC coefficients, using the same possible code values for run-level combinations can hurt efficiency.
In corresponding decoding, a decoder decodes VLCs to determine run counts and level values, and then reconstructs a one-dimensional string of quantized transform coefficients from the run counts and level values. The decoder scans the one-dimensional series into a two-dimensional block, performs inverse quantization, and performs an inverse frequency transform to reconstruct the block of sample values (or motion-compensated prediction residuals).
Aside from these simple variations of run-level coding/decoding, many other variations of run-level coding/decoding have been used for entropy coding/decoding of quantized transform coefficients. Other encoders and decoders use a different form of entropy coding/decoding such as adaptive Huffman coding/decoding or arithmetic coding/decoding.
Given the critical importance of encoding and decoding to digital video, it is not surprising that video encoding and decoding are richly developed fields. Whatever the benefits of previous video encoding and decoding techniques, however, they do not have the advantages of the following techniques and tools.