Engineers use compression (also called coding or encoding) to reduce the bit rate of digital video. Compression decreases the cost of storing and transmitting video by converting the video into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original video from the compressed form. A “codec” is an encoder/decoder system.
In general, video compression techniques include “intra-picture” compression and “inter-picture” compression. Intra-picture compression techniques compress an individual picture (without reference to other pictures that have been compressed and reconstructed). Inter-picture compression techniques compress a picture with reference to preceding and/or following picture(s) (often called reference or anchor pictures) that have already been compressed and reconstructed. A “key” picture is an intra-picture compressed picture that can be used as a reference picture for other pictures.
Intra-Picture and Inter-Picture Compression
To illustrate basic principles of intra-picture compression and inter-picture compression, consider an example block-based encoder and corresponding decoder. Real-world implementations of encoders and decoders are much more complex, of course, but these simplified examples show some of the ways that intra-picture compression typically differs from inter-picture compression.
The encoder performs intra-picture compression of an 8×8 block of samples for a key picture. The encoder splits the key picture into non-overlapping 8×8 blocks of samples and applies a forward 8×8 frequency transform to individual blocks. The frequency transform maps the sample values of a block to transform coefficients. In typical encoding scenarios, a relatively small number of frequency coefficients capture much of the energy or signal content in the block.
The encoder quantizes the transform coefficients, resulting in an 8×8 block of quantized transform coefficients. Quantization can affect the fidelity with which the transform coefficients are encoded, which in turn can affect bit rate. Coarser quantization tends to decrease fidelity to the original transform coefficients as the coefficients are more coarsely approximated. Bit rate also decreases, however, when decreased complexity can be exploited with lossless compression. Conversely, finer quantization tends to preserve fidelity and quality but result in higher bit rates. The encoder further encodes the quantized transform coefficients, for example, using entropy coding, and outputs a bitstream of compressed video information.
In corresponding decoding, a decoder reads the bitstream of compressed video information and performs operations to reconstruct the pictures that were encoded. When the encoding uses lossy compression (e.g., in quantization), the reconstructed pictures approximate the source pictures that were encoded but are not exactly the same.
For example, to reconstruct a version of the original 8×8 block of the key picture, the decoder reconstructs quantized transform coefficients, for example, using entropy decoding. The decoder inverse quantizes the quantized transform coefficients of the block and applies an inverse frequency transform to the de-quantized transform coefficients, producing the reconstructed version of the original 8×8 block. Since the key picture is used as a reference picture in subsequent motion compensation, the encoder also reconstructs the key picture.
Inter-picture compression techniques often use motion estimation and motion compensation to reduce bit rate by exploiting temporal redundancy in video. Motion estimation is a process for estimating motion between pictures. In general, motion compensation is a process of producing predictions from reference picture(s) (such as previously encoded/decoded key picture(s)) using motion data. An encoder and decoder store previously coded/decoded pictures in a picture store. The reference pictures in the picture store can then provide motion-compensated predictor blocks for the blocks of a current picture being encoded.
The encoder generally divides an inter-coded picture into rectangular, non-overlapping blocks of N×M samples. For a current block being encoded, the encoder attempts to a find a matching block in a reference picture. The reference picture's block is then used as a motion-compensated prediction for the current block. The reference picture's block can be at the same spatial location as the current block being encoded, or it can be at a different location, as indicated with a motion vector or some other form of motion data. Typically, the encoder does not find a perfect match. For this reason, the encoder computes the sample-by-sample difference between the current block and its motion-compensated prediction to determine a residual (also called error signal). The residual is frequency transformed, quantized, and entropy encoded. When motion compensation works well, the amount of bits used to encode motion-compensation residuals is small.
If a predicted picture is itself used as a reference picture for subsequent motion compensation, the encoder reconstructs the predicted picture. When reconstructing residuals, the encoder reconstructs transform coefficients that were quantized and performs an inverse frequency transform. The encoder performs motion compensation to compute motion-compensated predictors, and combines the predictors with the residuals. During decoding, a decoder typically entropy decodes information and performs analogous operations to reconstruct residuals, perform motion compensation, and combine the predictors with the reconstructed residuals.
In general, an encoder varies quantization to trade off quality and bit rate. A basic goal of lossy compression is to provide good rate-distortion performance. So, for a particular bit rate, an encoder attempts to provide the highest quality of video. Or, for a particular level of quality/fidelity to the original video, an encoder attempts to provide the lowest bit rate encoded video. In practice, considerations such as encoding time, encoding complexity, encoding resources, decoding time, decoding complexity, decoding resources, overall delay, and/or smoothness in quality/bit rate changes can also affect decisions made in codec design as well as decisions made during actual encoding.
As to the goal of smoothness in quality changes, many encoders seek to maintain a constant or relatively constant quality level from picture to picture. Such encoders usually adjust quantization or other parameters within the encoder to regulate the quality of the reconstructed pictures. Other encoders, under the assumption that allocating additional bits to key pictures may improve the quality of motion-compensated predictions using those key pictures (and hence improve the quality of non-key pictures), seek to encode key pictures at higher quality than non-key pictures.
Key Picture Popping Effects
In some scenarios, encoding results in key picture “popping” effects between key pictures and non-key pictures. For example, during playback of decoded video, key picture popping effects are perceptible as changes in quality between key pictures encoded using intra-picture compression and non-key pictures encoded using inter-picture compression.
FIG. 1 illustrates a simplified example of key picture popping effects. A series of video pictures includes the six video pictures (101 to 106) shown in FIG. 1. Each of the six video pictures (101 to 106) includes star shapes whose jagged edges add texture detail. (The stars are meant to depict objects containing an amount of spatial detail, such as the points of the stars.) Using typical encoder settings, one video picture (103) is encoded as a key picture using intra-picture compression. The remaining video pictures (101, 102, 104, 105, 106) are encoded using inter-picture compression.
After the encoding and decoding, the reconstructed video pictures (141 to 146) exhibit key picture popping effects. In particular, there are noticeable quality changes at the transitions to and from the reconstructed key picture (143). The key picture (143), encoded using intra-picture compression, maintains the spatial detail from the corresponding source video picture (103). Spatial detail was lost during encoding, however, for the other reconstructed video pictures (141, 142, 144, 145, 146). (The loss of detail is depicted by smoothing out the jagged edges and points of the stars.) The perceptual effects of key picture popping can be quite disruptive, as details that are clear in one picture (e.g., key picture 143) are blurred or missing in later pictures (e.g., picture 144). When key pictures are regularly spaced among non-key pictures, key picture popping effects can be periodic and particularly noticeable.
Key picture popping effects can be expected when an encoder deliberately seeks to encode key pictures at higher quality than non-key pictures. Even when encoders seek to encode key pictures at the same quality as non-key pictures, however, key picture popping effects can surface. Intra-compressed pictures tend to retain higher spatial frequency information content than inter-compressed pictures, even when the same quantization is applied. The discrepancy in the amount of spatial detail retained becomes worse at higher quantization levels, and noticeable popping effects accordingly become worse.
While previous approaches to regulating quality from picture to picture in encoding provide acceptable performance in some scenarios, they do not have the advantages of the techniques and tools described below for reducing key picture popping effects.