Digital video consumes large amounts of storage and transmission capacity. Many computers and computer networks lack the resources to process raw digital video. For this reason, engineers use compression (also called coding or encoding) to reduce the bit rate of digital video. Compression decreases the cost of storing and transmitting video by converting the video into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original video from the compressed form. A “codec” is an encoder/decoder system.
In general, video compression techniques include “intra-picture” compression and “inter-picture” compression. Intra-picture compression techniques compress an individual picture, and inter-picture compression techniques compress a picture with reference to a preceding and/or following picture (often called a reference or anchor picture) or pictures.
I. Intra and Inter Compression.
FIG. 1 illustrates block-based intra compression in an example encoder. In particular, FIG. 1 illustrates intra compression of an 8×8 block (105) of samples by the encoder. The encoder splits a picture into non-overlapping 8×8 blocks of samples and applies a forward 8×8 frequency transform (110) (such as a discrete cosine transform (“DCT”)) to individual blocks such as the block (105). The frequency transform (110) maps the sample values to transform coefficients. In typical encoding scenarios, a relatively small number of frequency coefficients capture much of the energy or signal content in video.
The encoder quantizes (120) the transform coefficients (115), resulting in an 8×8 block of quantized transform coefficients (125). Quantization can affect the fidelity with which the transform coefficients are encoded, which in turn can affect bit rate. Coarser quantization tends to decrease fidelity to the original transform coefficients as the coefficients are more coarsely approximated. Bit rate also decreases, however, when decreased complexity can be exploited with lossless compression. Conversely, finer quantization tends to preserve fidelity and quality but result in higher bit rates.
Returning to FIG. 1, further encoding varies depending on whether a coefficient is a DC coefficient (the lowest frequency coefficient shown as the top left coefficient in the block (125)), an AC coefficient in the top row or left column in the block (125), or another AC coefficient. The encoder typically encodes the DC coefficient (126) as a differential from the reconstructed DC coefficient (136) of a neighboring 8×8 block. The encoder entropy encodes (140) the differential. The entropy encoder can encode the left column or top row of AC coefficients as differentials from AC coefficients of a corresponding left column or top row of a neighboring 8×8 block. The encoder scans (150) the 8×8 block (145) of predicted, quantized AC coefficients into a one-dimensional array (155). The encoder then entropy encodes the scanned coefficients using a variation of run/level coding (160).
In corresponding decoding, a decoder produces a reconstructed version of the original 8×8 block. The decoder entropy decodes the quantized transform coefficients, scanning the quantized coefficients into a two-dimensional block, and performing AC prediction and/or DC prediction as needed. The decoder inverse quantizes the quantized transform coefficients of the block and applies an inverse frequency transform (such as an inverse DCT (“IDCT”)) to the de-quantized transform coefficients, producing the reconstructed version of the original 8×8 block. When a picture is used as a reference picture in subsequent motion compensation (see below), an encoder also reconstructs the picture.
Inter-picture compression techniques often use motion estimation and motion compensation to reduce bit rate by exploiting temporal redundancy in a video sequence. Motion estimation is a process for estimating motion between pictures. In general, motion compensation is a process of reconstructing pictures from reference picture(s) using motion data, producing motion-compensated predictions.
Whereas the example encoder divides an intra-coded picture into non-overlapping 8×8 blocks, the encoder more generally divides an inter-coded picture into rectangular, non-overlapping blocks of N×M samples, where N and M can be 4 or 8, so block size is 4×4, 4×8, 8×4 or 8×8. For a current unit (e.g., 8×8 block) being encoded, the encoder computes the sample-by-sample difference between the current unit and its motion-compensated prediction to determine a residual (also called error signal). The residual is frequency transformed, quantized, and entropy encoded.
If a predicted picture is used as a reference picture for subsequent motion compensation, the encoder reconstructs the predicted picture. When reconstructing residuals, the encoder reconstructs transform coefficients that were quantized and performs an inverse frequency transform. The encoder performs motion compensation to compute the motion-compensated predictors, and combines the predictors with the residuals. During decoding, a decoder typically entropy decodes information and performs analogous operations to reconstruct residuals, perform motion compensation, and combine the predictors with the reconstructed residuals.
II. Lossy Compression and Quantization.
Lossless compression reduces the bit rate of information by removing redundancy from the information without any reduction in fidelity. Lossless compression techniques reduce bit rate at no cost to quality, but can only reduce bit rate up to a certain point. Decreases in bit rate are limited by the inherent amount of variability in the statistical characterization of the input data, which is referred to as the source entropy.
In contrast, with lossy compression, quality suffers somewhat but the achievable decrease in bit rate is more dramatic. Lossy compression techniques can be used to reduce bit rate more than lossless compression techniques, but some of the reduction in bit rate is achieved by reducing quality, and the lost quality cannot be completely recovered. Lossy compression is often used in conjunction with lossless compression, in a system design in which the lossy compression establishes an approximation of the information and lossless compression techniques are applied to represent the approximation.
In general, an encoder varies quantization to trade off quality and bit rate. A basic goal of lossy compression is to provide good rate-distortion performance. So, for a particular bit rate, an encoder attempts to provide the highest quality of video. Or, for a particular level of quality/fidelity to the original video, an encoder attempts to provide the lowest bit rate encoded video. In practice, considerations such as encoding time, encoding complexity, encoding resources, decoding time, decoding complexity, decoding resources, overall delay, and/or smoothness in quality/bit rate changes also affect decisions made in codec design as well as decisions made during actual encoding.
According to one possible definition, quantization is a term used for an approximating non-reversible mapping function commonly used for lossy compression, in which there is a specified set of possible output values, and each member of the set of possible output values has an associated set of input values that result in the selection of that particular output value. A variety of quantization techniques have been developed, including scalar or vector, uniform or non-uniform, and adaptive or non-adaptive quantization.
According to one possible definition, a scalar quantizer is an approximating functional mapping of an input value x to a quantized value Q[x], sometimes called a reconstructed value. Each value of x within a given range between a pair of adjacent thresholds is assigned the same quantized value Q[x]. (At a threshold, one of the two possible quantized values is assigned to an input x, depending on the system.) The placement of the thresholds on the number line may be uniformly spaced or non-uniformly spaced.
A scalar quantizer can be decomposed into two distinct stages. The first stage is the classifier stage, in which a classifier functional mapping maps an input x to a quantization index A[x], which is often integer-valued. In essence, the classifier segments an input number line or data set. Each value of x within a given range is assigned the same quantization index. In the second stage, a reconstructor functional mapping maps each quantization index k to a reconstruction value β[k]. In essence, the reconstructor selects a value for reconstruction of each region determined by the classifier. Overall, the classifier relates to the reconstructor as follows: Q[x]=β[A[x]].
In common usage, the term “quantization” is often used to describe the classifier stage, which is performed during encoding. The term “inverse quantization” is similarly used to describe the reconstructor stage, whether performed during encoding or decoding.
A non-uniform quantizer has threshold values that are not uniformly spaced for all classifier regions. According to one possible definition, a dead zone plus uniform threshold quantizer is a quantizer with uniformly spaced threshold values for all classifier regions except the one containing the zero input value (which is called the dead zone (“DZ”)). In a general sense, a dead zone plus uniform threshold quantizer is a non-uniform quantizer, since the DZ size is different than the size for other classifier regions.
In many systems, the extent of quantization is parameterized in terms of quantization step size, which is adapted to regulate quality and/or bit rate. Coarser quantization uses larger quantization step sizes. Finer quantization uses smaller quantization step sizes. Often, for purposes of signaling and reconstruction, quantization step sizes are parameterized as multiples of a smallest quantization step size for a picture, macroblock or other unit of video.
Some standards and products also allow specification of a quantization matrix, or scaling matrix, that indicates different weights for different frequency coefficients of a block, so as to apply relatively coarser quantization to perceptually less important coefficients. Frequency coefficients are then quantized and inverse quantized using weighted quantization step sizes. For example, a scaling matrix for an intra-coded block uses higher weights for high frequency coefficients and lower weights for low frequency coefficients, which tends to shift distortion that is introduced to high frequency coefficients where it is less apt to cause perceptible quantization artifacts.
Some standards and products support selection between different reconstruction rules. For example, in some systems, a decoder can switch between a “uniform” quantizer reconstruction rule and a “non-uniform” quantizer reconstruction rule. Typically, for a given reconstruction rule, standards and products specify reconstruction values that correspond to midpoint reconstruction for the sake of simplicity. (Such reconstruction points are halfway between notional thresholds for quantization bins.)
Standards and product specifications that focus only on achieving interoperability will often specify reconstruction values without specifying a classification rule. In other words, some specifications may define the reconstructor functional mapping without defining the classifier functional mapping. This allows a decoder built to comply with the standard/product to reconstruct information correctly. In contrast, encoders are often given the freedom to change the classifier. For classification, the thresholds can be defined so that certain input values will be mapped to more common (and hence, lower bit rate) indices, which makes the reconstruction values closer to optimal for some content. When an encoder defines quantization bin boundaries, this allows the encoder to adjust to distributions in values. For example, for a given quantization parameter (“QP”), an encoder may define the DZ threshold to be 1.2*QP for a quantizer (rather than 1*QP as might be expected given midpoint reconstruction).
III. Animation Content.
Animation content appears as TV shows and short cartoons, commercials, and full-length feature movies. In many respects, typical animation content differs from typical natural video content. Backgrounds are usually simpler and more static for animation content, and motion is usually less complex. In addition, lines between objects are typically sharp in animation content.
As a result of these and other differences, encoding animation content with a general-purpose video encoder can provide unsatisfactory rate-distortion performance. For a given bit rate, perceptual quality can be relatively poor when distortion that would be imperceptible in natural video is visible in the animation video.
One approach to ensuring quality for animation content is to losslessly encode the content. While this improves quality, of course, the bit rate for the encoded content may be prohibitively high.
Another approach to improving encoding performance for animation content is to develop and use an animation-only encoder. Such an encoder might, for example, consider information about the animation models used to create the animation content to effectively encode the content. While this improves performance, it requires access to the animation models and can require an equally specialized decoder for decoding. Many decoding devices, however, lack the resources to support an extra animation-only decoder or different decoders for different types of animation content. Moreover, in some cases, the animation models used to create the animation content are not available at the time of encoding.
While previous approaches to encoding animation content provide acceptable performance in some scenarios, they do not have the advantages of the techniques and tools described below for encoding animation content.