Digital video consumes large amounts of storage and transmission capacity. A typical raw digital video sequence includes 15 or 30 frames per second. Each frame can include tens or hundreds of thousands of pixels (also called pels). Each pixel represents a tiny element of the picture. In raw form, a computer commonly represents a pixel with 24 bits. Thus, the number of bits per second, or bitrate, of a typical raw digital video sequence can be 5 million bits/second or more.
Most computers and computer networks lack the resources to process raw digital video. For this reason, engineers use compression (also called coding or encoding) to reduce the bitrate of digital video. Compression can be lossless, in which quality of the video does not suffer but decreases in bitrate are limited by the complexity of the video. Or, compression can be lossy, in which quality of the video suffers but decreases in bitrate are more dramatic. Decompression reverses compression.
In general, video compression techniques include intraframe compression and interframe compression. Intraframe compression techniques compress individual frames, typically called I-frames or key frames. Interframe compression techniques compress frames with reference to preceding and/or following frames, which are typically called predicted frames, P-frames, or B-frames.
Microsoft Corporation's Windows Media Video Versions 8 [“WMV8”] and 9 [“WMV9”] each include a video encoder and a video decoder. The encoders use intraframe and interframe compression, and the decoders use intraframe and interframe decompression. There are also several international standards for video compression and decompression, including the Motion Picture Experts Group [“MPEG”] 1, 2, and 4 standards and the H.26x standards. Like WMV8 and WMV9, these standards use a combination of intraframe and interframe compression and decompression.
I. Block-Based Intraframe Compression and Decompression
Many prior art encoders use block-based intraframe compression. To illustrate, suppose an encoder splits a video frame into 8×8 blocks of pixels and applies an 8×8 Discrete Cosine Transform [“DCT”] to individual blocks. The DCT converts a given 8×8 block of pixels (spatial information) into an 8×8 block of DCT coefficients (frequency information). The DCT operation itself is lossless or nearly lossless. The encoder quantizes the DCT coefficients, resulting in an 8×8 block of quantized DCT coefficients. Quantization is lossy, resulting in loss of precision, if not complete loss of the information for the coefficients. The encoder then prepares the 8×8 block of quantized DCT coefficients for entropy encoding and performs the entropy encoding, which is a form of lossless compression.
A corresponding decoder performs a corresponding decoding process. For a given block, the decoder performs entropy decoding, inverse quantization, an inverse DCT, etc., resulting in a reconstructed block. Due to the quantization, the reconstructed block is not identical to the original block. In fact, there may be perceptible errors within reconstructed blocks or at the boundaries between reconstructed blocks.
II. Block-Based Interframe Compression and Decompression
Many prior art encoders use block-based motion-compensated prediction coding followed by transform coding of residuals. To illustrate, suppose an encoder splits a predicted frame into 8×8 blocks of pixels. Groups of four 8×8 luminance blocks and two co-located 8×8 chrominance blocks form macroblocks. Motion estimation approximates the motion of the macroblock relative to a reference frame, for example, a previously coded, preceding frame. The encoder computes a motion vector for the macroblock. In motion compensation, the motion vector is used to compute a prediction macroblock for the macroblock using information from the reference frame. The prediction is rarely perfect, so the encoder usually encodes blocks of pixel differences (also called the error or residual blocks) between the prediction and the original macroblock. The encoder applies a DCT to the error blocks, resulting in blocks of coefficients. The encoder quantizes the DCT coefficients, prepares the blocks of quantized DCT coefficients for entropy encoding, and performs the entropy encoding.
A corresponding decoder performs a corresponding decoding process. The decoder performs entropy decoding, inverse quantization, an inverse DCT, etc., resulting in reconstructed error blocks. In a separate motion compensation path, the decoder computes a prediction using motion vector information relative to a reference frame. The decoder combines the prediction with the reconstructed error blocks. Again, the reconstructed video is not identical to the corresponding original, and there may be perceptible errors within reconstructed blocks or at the boundaries between reconstructed blocks.
III. Blocking Artifacts and Ringing Artifacts
Lossy compression can result in noticeable errors in video after reconstruction. The heavier the lossy compression and the higher the quality of the original video, the more likely it is for perceptible errors to be introduced in the reconstructed video. Two common kinds of errors are blocking artifacts and ringing artifacts.
Block-based compression techniques have benefits such as ease of implementation, but introduce blocking artifacts, which are perhaps the most common and annoying type of distortion in digital video today. Blocking artifacts are visible discontinuities around the edges of blocks in reconstructed video. Quantization and truncation (e.g., of transform coefficients from a block-based transform) cause blocking artifacts, especially when the compression ratio is high. When blocks are quantized independently, for example, one block may be quantized less or more than an adjacent block. Upon reconstruction, this can result in blocking artifacts at the boundary between the two blocks. Or, blocking artifacts may result when high-frequency coefficients are quantized, if the overall content of the blocks differs and the high-frequency coefficients are necessary to reconstruct transition detail across block boundaries.
Ringing artifacts are caused by quantization or truncation of high-frequency transform coefficients, whether the transform coefficients are from a block-based transform or from a wavelet-based transform. Both such transforms essentially represent an area of pixels as a sum of regular waveforms, where the waveform coefficients are quantized, encoded, etc. In some cases, the contributions of high-frequency waveforms counter distortion introduced by a low-frequency waveform. If the high-frequency coefficients are heavily quantized, the distortion may become visible as a wave-like oscillation at the low frequency. For example, suppose an image area includes sharp edges or contours, and high-frequency coefficients are heavily quantized. In a reconstructed image, the quantization may cause ripples or oscillations around the sharp edges or contours.
IV. Post-Processing Filtering
Blocking artifacts and ringing artifacts can be reduced using de-blocking and de-ringing techniques. These techniques are generally referred to as post-processing techniques, since they are typically applied after video has been decoded. Post-processing usually enhances the perceived quality of reconstructed video.
The WMV8 and WMV9 decoders use specialized filters to reduce blocking and ringing artifacts during post-processing. For additional information, see Annex A of U.S. Provisional Patent Application Ser. No. 60/341,674, filed Dec. 17, 2001 and Annex A of U.S. Provisional Patent Application Ser. No. 60/488,710, filed Jul. 18, 2003. Similarly, software implementing several of the MPEG and H.26x standards mentioned above has de-blocking and/or de-ringing filters. For example, see (1) the MPEG-4 de-blocking and de-ringing filters as tested in the verification model and described in Annex F, Section 15.3 of MPEG-4 draft N2202, (2) the H.263+ post-processing filter as tested in the Test Model Near-term, and (3) the H.264 JM post-processing filter. In addition, numerous publications address post-processing filtering techniques (as well as corresponding pre-processing techniques, in some cases). For example, see (1) Kuo et al., “Adaptive Postprocessor for Block Encoded Images,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 5, No. 4 (August 1995), (2) O'Rourke et al., “Improved Image Decompression for Reduced Transform Coding Artifacts,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 5, No. 6, (1995), and (3) Segall et al., “Pre- and Post-Processing Algorithms for Compressed Video Enhancement,” Proc. 34th Asilomar Conf. on Signals and Systems (2000).
FIG. 1 is a generalized diagram of post-processing filtering according to the prior art. A video encoder (110) accepts source video (105), encodes it, and produces a video bitstream (115). The video bitstream (115) is delivered via a channel (120), for example, by transmission as streaming media over a network. A video decoder (130) receives and decodes the video bitstream (115), producing decoded video (135). A post-processing filter (140) such as a de-ringing and/or de-blocking filter is used on the decoded video (135), producing decoded, post-processed video (145).
Strictly speaking, post-processing filtering techniques are not needed to decode the video bitstream (115). Codec (enCOder/DECoder) engineers may decide whether to apply such techniques when designing a codec. The decision can depend, for example, on whether CPU cycles are available for a software decoder, or on the additional cost for a hardware decoder. Since post-processing filtering techniques usually enhance video quality significantly, they are commonly applied in most video decoders today. Post-processing filters are sometimes designed independently from a video codec, so the same de-blocking and de-ringing filters may be applied to different codecs.
In prior systems, post-processing filtering is applied automatically to an entire video sequence. The assumption is that post-processing filtering will always at least improve video quality, and thus post-processing filtering should always be on. From system to system, filters may have different strengths according to the capabilities of the decoder. Moreover, some filters selectively disable or change the strength of filtering depending on decoder-side evaluation of the content of reconstructed video, but this adaptive processing is still automatically performed. There are several problems with these approaches.
First, the assumption that post-processing filtering always at least improves video quality is incorrect. For high quality video that is compressed without much loss, post-processing de-blocking and de-ringing may eliminate texture details and noticeably blur video images, actually decreasing quality. This sometimes occurs for high definition video encoded at high bitrates.
Second, there is no information in the video bitstream that guides post-processing filtering. The author is not allowed to control or adapt post-processing filtering by introducing information in the video bitstream to control the filtering.
V. In-Loop Filtering
Aside from post-processing filtering, several prior art systems use in-loop filtering. In-loop filtering involves filtering (e.g., de-blocking filtering) on reconstructed reference frames during motion compensation in the encoding and decoding processes (whereas post-processing is applied after the decoding process). By reducing artifacts in reference frames, the encoder and decoder improve the quality of motion-compensated prediction from the reference frames. For example, see (1) section 4.4 of U.S. Provisional Patent Application Ser. No. 60/341,674, filed Dec. 17, 2001, (2) section 4.9 of U.S. Provisional Patent Application Ser. No. 60/488,710, filed Jul. 18, 2003, (3) section 3.2.3 of the H.261 standard (which describes conditional low-pass filtering of macroblocks), (4) section 3.4.8 and Annex J of the H.263 standard, and (3) the relevant sections of the H.264 standard.
In particular, the H.264 standard allows an author to turn in-loop filtering on and off, and even modify the strength of the filtering, on a scene-by-scene basis. The H.264 standard does not, however, allow the author to adapt loop filtering for regions within a frame. Moreover, the H.264 standard applies only one kind of in-loop filter.
Given the critical importance of video compression and decompression to digital video, it is not surprising that video compression and decompression are richly developed fields. Whatever the benefits of previous video compression and decompression techniques, however, they do not have the advantages of the following techniques and tools.