Video coding standards employ block-based transforms (for example, the ubiquitous discrete cosine transform, or DCT) and motion compensation to achieve compression efficiency. Coarse quantization of the transform coefficients and the use of different reference locations or different reference pictures by neighboring blocks in motion-compensated prediction can give rise to visually disturbing artifacts such as distortion around edges, textures or block discontinuities. In the state-of-the-art International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 Recommendation (hereinafter the “MPEG-4 AVC Standard”), an adaptive de-blocking filter is introduced to combat the artifacts arising along block boundaries.
More general de-artifacting approaches have been proposed to combat artifacts not only on block discontinuities but also around image singularities (e.g., edges and/or textures), wherever these may appear. In a first prior art approach, an overcomplete set of 4×4 DCT is utilized to provide sparse decompositions of the noisy reconstructed signal in relative low resolution video sources, such as quarter common intermediate format (QCIF) and common intermediate format (CIF). However, the small size DCT transform may not be efficient in the coding of high resolution video content (e.g., 720 p and 1080 p video content) and transforms of larger sizes or different basis are needed to scale well to the increased spatial resolution. Specifically, the filter parameters (including the threshold, the number of iterations, and so forth) are very important to the filtering performance and should be adaptive with the transform as well.
Deblocking Filter in the MPEG-4 AVC Standard
Within the state-of-the-art MPEG-4 AVC Standard, an in-loop deblocking filter has been adopted. The filter acts to attenuate artifacts arising along block boundaries. Such artifacts are caused by coarse quantization of the transform (DCT) coefficients as well as motion compensated prediction. By adaptively applying low-pass filters to the block edges, the deblocking filter can improve both subjective and objective video quality. The filter operates by performing an analysis of the samples around a block edge and adapts filtering strength to attenuate small intensity differences attributable to blocky artifacts while preserving the generally larger intensity differences pertaining to the actual image content. Several block coding modes and conditions also serve to indicate the strength with which the filters are applied. These include inter/intra prediction decisions, the presence of coded residuals and motion differences between adjacent blocks. Besides adaptability on the block-level, the deblocking filter is also adaptive at the slice-level and the sample-level. On the slice level, filtering strength can be adjusted to the individual characteristics of the video sequence. On the sample level, filtering can be turned off at each individual sample depending on sample value and quantizer-based thresholds.
The blocky artifacts removed by the MPEG-4 AVC Standard deblocking filter are not the only artifacts that are present in compressed video. Coarse quantization is also responsible for other artifacts such as, for example, ringing, edge distortion, and texture corruption. The deblocking filter cannot reduce artifacts caused by quantization errors which appear inside a block. Moreover, the low-pass filtering techniques employed in deblocking assume a smooth image model and are not suited for processing image singularities such as, for example, edges and textures.
Sparsity-Based De-Artifacting
Inspired by sparsity-based de-noising techniques, a nonlinear in-loop filter has been proposed for compression de-artifacting as noted above with respect to the first prior art approach. The first prior art approach uses a set of de-noised estimates provided by an over-complete set of transforms. The implementation of the first prior art approach generates an over-complete set of transforms by using all possible translations Hi of a given two dimensional (2D) orthonormal transform H, such as wavelets or DCT. Thus, given an image I, a series of different transformed versions Yi of the image I is created by applying the various transforms Hi. Each transformed version Yi is then subject to a de-noising procedure, typically involving a thresholding operation, producing the series of Y′i. The transformed and thresholded coefficients Y′i are then inverse transformed back into the spatial domain, giving rise to the de-noised estimates Ii. In over-complete settings, it is expected that some of the de-noised estimates will provide better performance than others and that the final filtered version I′ will benefit from a combination via averaging of such de-noised estimates. The first prior art approach de-noising filter proposes the weighted averaging of de-noised estimates I′i where the weights are optimized to emphasize the best de-noised estimates based on signal sparsity.
The set of orthonormal transforms {Hi} is expected to provide sparse decompositions of the image I. For instance, the DCT of block size 4×4 has been used in the first prior art approach process for QCIF content. With the growing popularity of high definition (HD) content, a small block size DCT may no longer be efficient as it does not scale well to the increased resolution, especially when the encoding procedure utilizes larger transforms and quantizes the coefficients in a larger block scale. In this regard, transforms of a larger size (e.g., 8×8 or 16×16) or with different basis functions are introduced in de-noising to better exploit the spatial correlation within larger block units.
On the other hand, the choice of filter parameters, for example, such as threshold, is of great importance to the performance of the de-artifacting filter. The threshold is essential to the de-noising capacity of the filter in terms of both the accuracy of the de-noised estimates and the averaging weights that emphasize the best de-noising estimates. Inadequate threshold selection may result in over-smoothed reconstructed pictures or may allow the persistence of artifacts. In the first prior art approach, the thresholds per pixel class based on QP and coding mode information are stored at both the encoder and the decoder and they are not adaptive with the transform.
With different transform sizes or different basis functions, the noise or artifacts behavior of a video sequence under the same QP or coding mode can be very different, which calls for different filter parameters.