Digital video consumes large amounts of storage and transmission capacity. Many computers and computer networks lack the resources to process raw digital video. For this reason, engineers use compression (also called coding or encoding) to reduce the bit rate of digital video. Compression decreases the cost of storing and transmitting video by converting the video into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original video from the compressed form. A “codec” is an encoder/decoder system.
In general, video compression techniques include “intra-picture” compression and “inter-picture” compression. Intra-picture compression techniques compress an individual picture, and inter-picture compression techniques compress a picture with reference to a preceding and/or following picture (often called a reference or anchor picture) or pictures.
I. Intra and Inter Compression.
FIG. 1 illustrates block-based intra compression in an example encoder. In particular, FIG. 1 illustrates intra compression of an 8×8 block (105) of samples by the encoder. The encoder splits a picture into non-overlapping 8×8 blocks of samples and applies a forward 8×8 frequency transform (110) (such as a discrete cosine transform (“DCT”)) to individual blocks such as the block (105). (In some cases, the encoder subtracts 128 from the 8-bit sample values before the frequency transform.) The frequency transform (110) maps the sample values to transform coefficients, which are coefficients of basis functions that correspond to frequency components. In typical encoding scenarios, a relatively small number of frequency coefficients capture much of the energy or signal content in video.
The encoder quantizes (120) the transform coefficients (115), resulting in an 8×8 block of quantized transform coefficients (125). With quantization, the encoder essentially trades off quality and bit rate. More specifically, quantization can affect the fidelity with which the transform coefficients are encoded, which in turn can affect bit rate. Coarser quantization tends to decrease fidelity to the original transform coefficients as the coefficients are more coarsely approximated. Bit rate also decreases, however, when decreased complexity can be exploited with lossless compression. Conversely, finer quantization tends to preserve fidelity and quality but result in higher bit rates. Different encoders use different parameters for quantization. In most encoders, a step size of quantization is set for a block, picture, or other unit of video. Some encoders quantize coefficients differently within a given block, so as to apply relatively coarser quantization to perceptually less important coefficients, and a quantization matrix can be used to indicate the relative quantization weights. Or, apart from the rules used to reconstruct quantized values, some encoders define the thresholds according to which values are quantized so as to quantize values more aggressively.
Returning to FIG. 1, further encoding varies depending on whether a coefficient is a DC coefficient (the lowest frequency coefficient shown as the top left coefficient in the block (125)), an AC coefficient in the top row or left column in the block (125), or another AC coefficient. The encoder typically encodes the DC coefficient (126) as a differential from the reconstructed DC coefficient (136) of a neighboring 8×8 block. The encoder entropy encodes (140) the differential. The entropy encoder can encode the left column or top row of AC coefficients as differentials from AC coefficients a corresponding left column or top row of a neighboring 8×8 block. The encoder scans (150) the 8×8 block (145) of predicted, quantized AC coefficients into a one-dimensional array (155). The encoder then entropy encodes the scanned coefficients using a variation of run/level coding (160).
In corresponding decoding, a decoder produces a reconstructed version of the original 8×8 block. The decoder entropy decodes the quantized transform coefficients, scanning the quantized coefficients into a two-dimensional block, and performing AC prediction and/or DC prediction as needed. The decoder inverse quantizes the quantized transform coefficients of the block and applies an inverse frequency transform (such as an inverse DCT (“IDCT”)) to the de-quantized transform coefficients, producing the reconstructed version of the original 8×8 block. (If the encoder subtracted 128 from the 8-bit sample values before the frequency transform, 128 is now added back to the sample values.) When a picture is used as a reference picture in subsequent motion compensation (see below), an encoder also reconstructs the picture.
Inter-picture compression techniques often use motion estimation and motion compensation to reduce bit rate by exploiting temporal redundancy in a video sequence. Motion estimation is a process for estimating motion between pictures. In general, motion compensation is a process of reconstructing pictures from reference picture(s) using motion data, producing motion-compensated predictions.
Whereas the example encoder divides an intra-coded picture into non-overlapping 8×8 blocks, the encoder more generally divides an inter-coded picture into rectangular, non-overlapping blocks of N×M samples, where N and M can be 4 or 8, so block size is 4×4, 4×8, 8×4 or 8×8. For a current unit (e.g., 8×8 block) being encoded, the encoder computes the sample-by-sample difference between the current unit and its motion-compensated prediction to determine a residual (also called error signal). The residual is frequency transformed, quantized, and entropy encoded.
If a predicted picture is used as a reference picture for subsequent motion compensation, the encoder reconstructs the predicted picture. When reconstructing residuals, the encoder reconstructs transform coefficients that were quantized and performs an inverse frequency transform. The encoder performs motion compensation to compute the motion-compensated predictors, and combines the predictors with the residuals. During decoding, a decoder typically entropy decodes information and performs analogous operations to reconstruct residuals, perform motion compensation, and combine the predictors with the reconstructed residuals.
II. Lossy Compression and Quantization.
Lossless compression reduces the bit rate of information by removing redundancy from the information without any reduction in fidelity. Lossless compression techniques reduce bit rate at no cost to quality, but can only reduce bit rate up to a certain point. Decreases in bit rate are limited by the inherent amount of variability in the statistical characterization of the input data, which is referred to as the source entropy.
In contrast, with lossy compression, quality suffers somewhat but the achievable decrease in bit rate is more dramatic. Lossy compression techniques can be used to reduce bit rate more than lossless compression techniques, but some of the reduction in bit rate is achieved by reducing quality, and the lost quality cannot be completely recovered. Lossy compression is often used in conjunction with lossless compression, in a system design in which the lossy compression establishes an approximation of the information and lossless compression techniques are applied to represent the approximation.
In general, an encoder varies quantization to trade off quality and bit rate. A basic goal of lossy compression is to provide good rate-distortion performance. So, for a particular bit rate, an encoder attempts to provide the highest quality of video. Or, for a particular level of quality/fidelity to the original video, an encoder attempts to provide the lowest bit rate encoded video. In practice, considerations such as encoding time, encoding complexity, encoding resources, decoding time, decoding complexity, decoding resources, overall delay, and/or smoothness in quality/bit rate changes also affect decisions made in codec design as well as decisions made during actual encoding.
According to one possible definition, quantization is a term used for an approximating non-reversible mapping function commonly used for lossy compression, in which there is a specified set of possible output values, and each member of the set of possible output values has an associated set of input values that result in the selection of that particular output value. A variety of quantization techniques have been developed, including scalar or vector, uniform or non-uniform, and adaptive or non-adaptive quantization.
A. Scalar Quantizers.
According to one possible definition, a scalar quantizer is an approximating functional mapping x→Q[x] of an input value x to a quantized value Q[x], sometimes called a reconstructed value. FIG. 2 shows a “staircase” I/O function (200) for a scalar quantizer. The horizontal axis is a number line for an input variable x, and the vertical axis indicates the corresponding quantized values Q[x]. The number line is partitioned by thresholds such as the threshold (210). Each value of x within a given range between a pair of adjacent thresholds is assigned the same quantized value Q[x]. For example, each value of x within the range (220) is assigned the same quantized value (230). (At a threshold, one of the two possible quantized values is assigned to an input x, depending on the system.) Overall, the quantized values Q[x] exhibit a discontinuous, staircase pattern. The distance the mapping continues along the number line depends on the system, typically ending after a finite number of thresholds. The placement of the thresholds on the number line may be uniformly spaced (as shown in FIG. 2) or non-uniformly spaced.
A scalar quantizer can be decomposed into two distinct stages. The first stage is the classifier stage, in which a classifier function mapping x→A[x] maps an input x to a quantization index A[x], which is often integer-valued. In essence, the classifier segments an input number line or data set. FIG. 3A shows a generalized classifier (300) and thresholds for a scalar quantizer. As in FIG. 2, a number line for a variable x is segmented by thresholds such as the threshold (310). Each value of x within a given range such as the range (320) is assigned the same quantization index.
In the second stage, a reconstructor functional mapping k→β[k] maps each quantization index k to a reconstruction value β[k]. In essence, the reconstructor selects a value for reconstruction of each region determined by the classifier. The reconstructor functional mapping may be implemented, for example, using a lookup table. FIG. 3B shows example classifier (350) thresholds for a scalar quantizer and also shows (as open circles) example reconstruction points according to a midpoint reconstruction rule. Overall, the classifier relates to the reconstructor as follows:Q[x]=β[A[x]]  (1).
In common usage, the term “quantization” is often used to describe the classifier stage, which is performed during encoding. The term “inverse quantization” is similarly used to describe the reconstructor stage, whether performed during encoding or decoding.
B. Dead Zone+Uniform Threshold Quantizers.
A non-uniform quantizer has threshold values that are not uniformly spaced for all classifier regions. According to one possible definition, a dead zone plus uniform threshold quantizer (“DZ+UTQ”) is a quantizer with uniformly spaced threshold values for all classifier regions except the one containing the zero input value (which is called the dead zone (“DZ”)). In a general sense, a DZ+UTQ is a non-uniform quantizer, since the DZ size is different than the other classifier regions.
FIG. 4 shows a staircase I/O function (400) for a DZ+UTQ, and FIG. 5A shows a generalized classifier (500) and thresholds for a DZ+UTQ. In FIG. 5A, the DZ is twice as wide as the other classification zones. FIG. 5B shows example classifier (550) thresholds for a DZ+UTQ and also shows (as open circles) example reconstruction points according to a midpoint reconstruction rule.
C. Adjusting Quantization.
In many systems, the extent of quantization is parameterized in terms of quantization step size, which is adapted to regulate quality and/or bit rate. Coarser quantization uses larger quantization step sizes. Finer quantization uses smaller quantization step sizes. Often, for purposes of signaling and reconstruction, quantization step sizes are parameterized as multiples of a smallest quantization step size.
Some standards and products also allow specification of a quantization matrix, or scaling matrix, that indicates different weights for different frequency coefficients in quantization. Frequency coefficients are then quantized and inverse quantized using weighted quantization step sizes. For example, a scaling matrix for an intra-coded block uses higher weights for high frequency coefficients and lower weights for low frequency coefficients, which tends to shift distortion that is introduced to high frequency coefficients where it is less apt to cause perceptible quantization artifacts.
Some standards and products support selection between different reconstruction rules. For example, in some systems, a decoder can switch between a “uniform” quantizer reconstruction rule and a “non-uniform” quantizer reconstruction rule. Typically, for a given reconstruction rule, standards and products specify reconstruction values that correspond to midpoint reconstruction for the sake of simplicity. In FIGS. 3B and FIG. 5B, example reconstruction points according to midpoint reconstruction rules are superimposed as circles at the midpoints of the ranges that define quantization bins.
Standards and product specifications that focus only on achieving interoperability will often specify reconstruction values without specifying a classification rule. In other words, some specifications may define the functional mapping k→β[k] without defining the functional mapping x→A[x]. This allows a decoder built to comply with the standard/product to reconstruct information correctly. In contrast, encoders are often given the freedom to change the classifier. For classification, the thresholds can be defined so that certain input values will be mapped to more common (and hence, lower bit rate) indices, which makes the reconstruction values closer to optimal for some content. When an encoder defines quantization bin boundaries in a static way, this allows the encoder to adjust in a predetermined way to expected distributions in values. For example, an encoder may define the DZ threshold to be 1.2*QP for a quantizer (rather than 1*QP as might be expected given midpoint reconstruction). While changing how quantization thresholds are defined can improve performance, it does not support content-adaptive behavior during quantization.
The preceding adaptive quantization mechanisms help improve performance in many scenarios. In some configurations, however, they fail to provide fine-grained control over quantization that is sufficiently adaptive.