Engineers use compression (also called coding or encoding) to reduce the bit rate of digital video. Compression decreases the cost of storing and transmitting video by converting the video into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original video from the compressed form. A “codec” is an encoder/decoder system.
Intra-picture Compression and Inter-picture Compression
In general, video compression techniques include “intra-picture” compression and “inter-picture” compression. Intra-picture compression techniques compress an individual picture without reference to other pictures that have been compressed and reconstructed. Inter-picture compression techniques compress a picture with reference to preceding and/or following picture(s) (often called reference or anchor pictures) that have already been compressed and reconstructed.
Most encoders use a frequency transform during intra-picture compression and inter-picture compression. For example, the encoder splits a picture into non-overlapping blocks of samples and applies a forward frequency transform to individual blocks. The frequency transform maps the sample values of a block to transform coefficients, which are coefficients of basis functions that correspond to frequency components. In particular, the lowest frequency coefficient—called the DC coefficient—indicates the average sample value for the block. The other coefficients—called AC coefficients—indicate patterns of changes in sample values of the block, from gradual low-frequency variations across the block to sharper high-frequency variations within the block. In many encoding scenarios, a relatively small number of frequency coefficients (e.g., the DC coefficient and lower frequency AC coefficients) capture much of the energy or signal content in the block. The encoder quantizes the transform coefficients, resulting in a block of quantized transform coefficients. The encoder further encodes the quantized transform coefficients, for example, using entropy coding, and outputs a bitstream of compressed video information.
In corresponding decoding, a decoder reads the bitstream of compressed video information and performs operations to reconstruct the pictures that were encoded. When the encoding uses lossy compression (e.g., in quantization), the reconstructed pictures approximate the source pictures that were encoded but are not exactly the same. For example, to reconstruct a version of the original 8×8 block of an intra-compressed picture, the decoder reconstructs quantized transform coefficients using entropy decoding. The decoder inverse quantizes the quantized transform coefficients of the block and applies an inverse frequency transform to the de-quantized transform coefficients, producing the reconstructed version of the original 8×8 block.
Inter-picture compression techniques often use motion compensation to reduce bit rate by exploiting temporal redundancy in video. In general, motion compensation is a process of producing predictions from reference picture(s) (such as previously encoded/decoded picture(s)) using motion data. An encoder and decoder store previously coded/decoded pictures in a picture store. The reference pictures in the picture store can then provide motion-compensated predictor blocks for the blocks of a current picture being encoded. Often, the encoder does not find a perfect match. For this reason, the encoder computes the sample-by-sample differences between the current block and its motion-compensated prediction to determine a residual (also called error signal). The residual is frequency transformed, quantized, and entropy encoded. When reconstructing residuals, a decoder (and also the encoder) reconstructs transform coefficients that were quantized and performs an inverse frequency transform. The decoder/encoder performs motion compensation to compute motion-compensated predictors, and combines the predictors with the residuals.
Types of Quantization
According to one possible definition, quantization is a term used for an approximating non-reversible mapping function commonly used for lossy compression, in which there is a specified set of possible output values, and each member of the set of possible output values has an associated set of input values that result in the selection of that particular output value. A variety of quantization techniques have been developed, including scalar or vector, uniform or non-uniform, and adaptive or non-adaptive quantization.
According to one possible definition, a scalar quantizer is an approximating functional mapping x→Q[x] of an input value x to a quantized value Q[x], sometimes called a reconstructed value. FIG. 1 shows a “staircase” I/O function (100) for a scalar quantizer, along with example reconstruction points for inverse quantization. The horizontal axis is a number line for an input variable x, and the vertical axis indicates the corresponding quantized values Q[x]. The number line is partitioned by thresholds such as the threshold (110). Each value of x within a given range between a pair of adjacent thresholds is assigned the same quantized value Q[x]. For example, each value of x within the range (120) is assigned the same quantized value (130). (At a threshold, one of the two possible quantized values is assigned to an input x, depending on the system.) Overall, the quantized values Q[x] exhibit a discontinuous, staircase pattern. The placement of the thresholds on the number line may be uniformly spaced (as shown in FIG. 1) or non-uniformly spaced.
A scalar quantizer can be decomposed into two distinct stages. The first stage is the classifier stage, in which a classifier function mapping x→A [x] maps an input x to a quantization index A[x], which is often integer-valued. In essence, the classifier segments an input number line or data set, as in FIG. 1, by thresholds such as the threshold (110).
In the second stage, a reconstructor functional mapping k→β[k] maps each quantization index k to a reconstruction value β[k]. In essence, the reconstructor selects a value for reconstruction of each region determined by the classifier. The reconstructor functional mapping may be implemented, for example, using a lookup table. FIG. 1 shows (as open circles) example reconstruction points according to a midpoint reconstruction rule. Overall, the classifier relates to the reconstructor as follows:Q[x]=β[A[x]]  (1).
In common usage, the term “quantization” is often used to describe the classifier stage, which is performed during encoding. The term “inverse quantization” is similarly used to describe the reconstructor stage, whether performed during encoding or decoding.
A non-uniform quantizer has threshold values that are not uniformly spaced for all classifier regions. According to one possible definition, a dead zone plus uniform threshold quantizer (“DZ+UTQ”) is a quantizer with uniformly spaced threshold values for all classifier regions except the one containing the zero input value (which is called the dead zone (“DZ”)). In a general sense, a DZ+UTQ is a non-uniform quantizer, since the DZ size is different than the other classifier regions.
FIG. 2 shows a staircase I/O function (200) for a DZ+UTQ, in which the DZ is wider than the other steps s. The number line is partitioned by thresholds such as the threshold (210), and each value of x within a given range between a pair of adjacent thresholds is assigned the same quantized value Q[x]. For example, each value of x within the range (220) is assigned the same quantized value (230). In FIG. 2, the DZ is twice as wide as the other classification zones. FIG. 2 shows (as open circles) example reconstruction points according to a midpoint reconstruction rule.
Adjusting Quantization
Quantization can affect the fidelity with which transform coefficients are encoded, which in turn can affect bit rate. Coarser quantization tends to decrease fidelity to the original transform coefficients (and produce more distortion) as the coefficients are more coarsely approximated. Bit rate also decreases, however, when decreased complexity can be exploited with lossless compression (e.g., entropy encoding). Conversely, finer quantization tends to preserve fidelity and quality (and produce less distortion) but result in higher bit rates.
Some encoders adjust quantization between pictures and/or within pictures to control where distortion is introduced. For a given bit rate/quality level, this allows an encoder to introduce more distortion where it will be less visible and/or avoid introducing distortion where it would be more visible. The allocation of available bits among pictures and within pictures plays an important role in how distortion is introduced and how the user perceives the quality of the video.
Different encoders typically apply different quantization rules, but there are some common principles. Quantization can produce visible artifacts that tend to be more artificial-looking and visually distracting than simple loss of fine detail. For example, the human visual system is more sensitive to distortion in relatively smooth content than to distortion in textured content. High texture levels tend to mask quality degradation and quantization artifacts. On the other hand, in regions with lower texture levels, distortion tends to be more visible. So, in smooth regions distortion may create a visible line, step or other flaw in the reconstructed image, while the same amount of distortion may not create noticeable flaws in textured areas due to masking effects of surrounding detail.
Thus, a common strategy is to allocate relatively more bits to smooth content and relatively fewer bits to textured content, so that less distortion is introduced in smooth content at the expense of more distortion in the textured content (where the distortion is not as perceptually noticeable). To identify textured content and non-textured content, various texture metrics and texture thresholds have been used. In some cases, an encoder varies quantization depending on texture. This allows the encoder to coarsen quantization when doing so will not dramatically increase perceptibility of the distortion and use finer quantization in other situations.
Standards and product specifications that focus only on achieving interoperability will often specify reconstruction values for inverse quantization without specifying a classification rule for quantization. In other words, some specifications may define the functional mapping k→β[k] for reconstruction without defining the functional mapping x→A[x] for classification. This allows a decoder built to comply with the standard/product to reconstruct information correctly. In contrast, encoders are often given the freedom to change the classifier. For classification, the thresholds can be defined so that certain input values will be mapped to more common (and hence, lower bit rate) indices, which makes the reconstruction values closer to optimal for some content. This also allows the encoder to adjust to expected distributions in values. For example, an encoder may define the DZ threshold to be wider or narrower for a quantizer. Or, more generally, the encoder may define other thresholds according to which values are quantized so as to quantize values more aggressively.
The preceding adaptive quantization mechanisms help improve performance in many scenarios. In some scenarios, however, they fail to provide quantization control that is both usable and sufficiently fine-grained. For example, in some scenarios, previous adaptive quantization mechanisms provide insufficient control over how content is classified as textured or non-textured. As a result, encoding of smooth areas introduces an unacceptable amount of distortion. Another problem is that, in some scenarios, previous adaptive quantization mechanisms provide insufficient control over bit allocation for different types of non-textured content. Given the critical importance of video compression to digital video, it is not surprising that video compression is a richly developed field. Whatever the benefits of previous video compression techniques, however, they do not have the advantages of the following techniques and tools.