Many hybrid video coding technologies employ motion compensation and block-based transforms (e.g., discrete cosine transforms (DCTs)) to reduce correlation in the spatial and temporal domains. Coarse quantization of transform coefficients and the absence of visual quality constraints in rate-distortion (RD) based optimization may give rise to visual artifacts.
In the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 Recommendation (hereinafter the “MPEG-4 AVC Standard”), an in-loop deblocking filter is employed to reduce blocky artifacts arising along coded block boundaries. Such artifacts are caused by coarse quantization of the transform (DCT) coefficients as well as motion compensated prediction. By applying low-pass filters to the block edges, the deblocking filter can improve both subjective and objective video quality.
Turning to FIG. 1, a video encoder capable of performing video encoding in accordance with the MPEG-4 AVC Standard is indicated generally by the reference numeral 100.
The video encoder 100 includes a frame ordering buffer 110 having an output in signal communication with a non-inverting input of a combiner 185. An output of the combiner 185 is connected in signal communication with a first input of a transformer and quantizer 125. An output of the transformer and quantizer 125 is connected in signal communication with a first input of an entropy coder 145 and a first input of an inverse transformer and inverse quantizer 150. An output of the entropy coder 145 is connected in signal communication with a first non-inverting input of a combiner 190. An output of the combiner 190 is connected in signal communication with a first input of an output buffer 135.
A first output of an encoder controller 105 is connected in signal communication with a second input of the frame ordering buffer 110, a second input of the inverse transformer and inverse quantizer 150, an input of a picture-type decision module 115, a first input of a macroblock-type (MB-type) decision module 120, a second input of an intra prediction module 160, a second input of a deblocking filter 165, a first input of a motion compensator 170, a first input of a motion estimator 175, and a second input of a reference picture buffer 180.
A second output of the encoder controller 105 is connected in signal communication with a first input of a Supplemental Enhancement Information (SEI) inserter 130, a second input of the transformer and quantizer 125, a second input of the entropy coder 145, a second input of the output buffer 135, and an input of the Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) inserter 140.
An output of the SEI inserter 130 is connected in signal communication with a second non-inverting input of the combiner 190.
A first output of the picture-type decision module 115 is connected in signal communication with a third input of the frame ordering buffer 110. A second output of the picture-type decision module 115 is connected in signal communication with a second input of a macroblock-type decision module 320.
An output of the Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) inserter 140 is connected in signal communication with a third non-inverting input of the combiner 190.
An output of the inverse quantizer and inverse transformer 150 is connected in signal communication with a first non-inverting input of a combiner 119. An output of the combiner 119 is connected in signal communication with a first input of the intra prediction module 160 and a first input of the deblocking filter 165. An output of the deblocking filter 165 is connected in signal communication with a first input of a reference picture buffer 180. An output of the reference picture buffer 180 is connected in signal communication with a second input of the motion estimator 175 and a third input of the motion compensator 170. A first output of the motion estimator 175 is connected in signal communication with a second input of the motion compensator 170. A second output of the motion estimator 175 is connected in signal communication with a third input of the entropy coder 145.
An output of the motion compensator 170 is connected in signal communication with a first input of a switch 197. An output of the intra prediction module 160 is connected in signal communication with a second input of the switch 197. An output of the macroblock-type decision module 120 is connected in signal communication with a third input of the switch 197. The third input of the switch 197 determines whether or not the “data” input of the switch (as compared to the control input, i.e., the third input) is to be provided by the motion compensator 170 or the intra prediction module 160. The output of the switch 197 is connected in signal communication with a second non-inverting input of the combiner 119 and an inverting input of the combiner 185.
A first input of the frame ordering buffer 110 and an input of the encoder controller 105 are available as inputs of the encoder 100, for receiving an input picture. Moreover, a second input of the Supplemental Enhancement Information (SEI) inserter 130 is available as an input of the encoder 100, for receiving metadata. An output of the output buffer 135 is available as an output of the encoder 100, for outputting a bitstream.
Turning to FIG. 2, a video decoder capable of performing video decoding in accordance with the MPEG-4 AVC Standard is indicated generally by the reference numeral 200.
The video decoder 200 includes an input buffer 210 having an output connected in signal communication with a first input of the entropy decoder 245. A first output of the entropy decoder 245 is connected in signal communication with a first input of an inverse transformer and inverse quantizer 250. An output of the inverse transformer and inverse quantizer 250 is connected in signal communication with a second non-inverting input of a combiner 225. An output of the combiner 225 is connected in signal communication with a second input of a deblocking filter 265 and a first input of an intra prediction module 260. A second output of the deblocking filter 265 is connected in signal communication with a first input of a reference picture buffer 280. An output of the reference picture buffer 280 is connected in signal communication with a second input of a motion compensator 270.
A second output of the entropy decoder 245 is connected in signal communication with a third input of the motion compensator 270 and a first input of the deblocking filter 265. A third output of the entropy decoder 245 is connected in signal communication with an input of a decoder controller 205. A first output of the decoder controller 205 is connected in signal communication with a second input of the entropy decoder 245. A second output of the decoder controller 205 is connected in signal communication with a second input of the inverse transformer and inverse quantizer 250. A third output of the decoder controller 205 is connected in signal communication with a third input of the deblocking filter 265. A fourth output of the decoder controller 205 is connected in signal communication with a second input of the intra prediction module 260, a first input of the motion compensator 270, and a second input of the reference picture buffer 280.
An output of the motion compensator 270 is connected in signal communication with a first input of a switch 297. An output of the intra prediction module 260 is connected in signal communication with a second input of the switch 297. An output of the switch 297 is connected in signal communication with a first non-inverting input of the combiner 225.
An input of the input buffer 210 is available as an input of the decoder 200, for receiving an input bitstream. A first output of the deblocking filter 265 is available as an output of the decoder 200, for outputting an output picture.
The above mentioned deblocking filter operates by performing an analysis of the samples around a block edge and adapting the filtering strength to attenuate small intensity differences attributable to blocky artifacts while preserving the generally larger intensity differences pertaining to the actual image content. Several block coding modes and conditions also serve to indicate the strength with which the filters are applied. These include inter/intra prediction decisions and the presence of coded residuals and motion differences between adjacent blocks. Besides adaptability on the block-level, the deblocking filter is also adaptable at the slice-level and at the sample-level. On the slice level, filtering strength can be adjusted to the individual characteristics of the video sequence. On the sample level, filtering can be turned off at each individual sample depending on the sample value and quantizer-based thresholds.
However, the blocky artifacts removed by the MPEG-4 AVC Standard deblocking filter are not the only artifacts that are present in compressed video. Coarse quantization is also responsible for other artifacts such as ringing, edge distortion, and texture corruption. The deblocking filter cannot reduce artifacts caused by quantization errors which appear inside a block. Moreover, the low-pass filtering techniques employed in deblocking assume a smooth image model and are not suited for processing image singularities such as edges or textures.
Recently, studies have been performed relating to the application of sparsity-based denoising approaches to images and videos. Some of these approaches have involved a sparse matrix. A sparse matrix is a matrix populated primarily with zeros.
Relating to the aforementioned studies and, in particular, based on discoveries in neuron-science and image processing, it has been determined that natural images or videos share sparse characteristics which distinguish them from random noise signals. This sparsity characteristic for image and video denoising means that the image and video signals can be sparsely decomposed by some bases. Hence, many algorithms have been developed based on the preceding sparsity characteristic. Many sparsity-based denoising methods typically assume that the true signal can be well approximated by a linear combination of a few basic elements. That is, the signal is sparsely represented in a transform domain. Hence by preserving the few high-magnitude transform coefficients that have a high probability of conveying the true-signal energy and discarding the rest, with the latter having a high percentage and high likelihood of being largely due to noise, the true signal can be effectively estimated.
A sparsity-based denoising operation typically involves the following three basic steps: transform; shrinkage (or thresholding); and inverse transform. One popular approach for exploiting a sparse image model is to use an over-complete set of linear transforms and a thresholding operation. For example, a first prior art approach referred to as the “k-SVD approach” involves a basis pursuit approach, where the transform bases are trained based on an image or video database by minimizing an energy function.
Different from this basis pursuit approach, other prior art approaches adapt the signal to standard bases instead of pursuing the suitable basis for the signal. For example, in a second prior art approach, a sliding-window transform-domain denoising method is presented, where the basic idea is to apply shrinkage in a local (windowed) transform domain obtained through a standard transform, such as Fast Fourier Transform (FFT) or Discrete Cosine Transform (DCT). The overlap between successive windows accounts for the over-completeness. In a third and a fourth prior art approach, kNN (k nearest neighbors) patch-based denoising approaches are proposed. Instead of using an overlapped spatial neighbor as in the first prior art approach, the third and fourth prior art approaches search the similar d-dimensional region (patch or area) in a non-local adaptive way and then apply a d+1 dimensional transform on the “grouped” regions (patches or areas) followed by similar shrinkage (thresholding) and inverse transform. The final denoised pixel is the weighted average of all estimates of that pixel.
Without losing generality, all the above approaches except for k-SVD can all be thought of as kNN region-based denoising approaches, where for the two dimensional (2D) case, the spatial neighboring patch can be considered as an ad-hoc method to do kNN.
Turning to FIG. 3, the general framework of the kNN region-based sparsity denoising approach is indicated generally by the reference numeral 300. Region (patch or area) clustering is performed on a region cluster 305 based on a similarity criteria or metric in order to “pack” 310 the region cluster 305 and obtain a “packed” region cluster representation 315. The region dimension and size can be 2D or 3D. Then, a selected transform 320 (e.g., FFT or DCT) is applied to the packed region cluster representation 315 to obtain a sparse representation 325 there for. The dimension of transform is dependent on the region and region cluster dimension. In the transform domain, a shrinkage or thresholding operation 330 is often applied to the sparse representation 325 for noise removal to obtain a processed (transform domain) signal 335 representing a post-shrinkage result. Then, an inverse transformation 340 is applied to take the processed (transform domain) signal 335 back to the intensity domain, thus providing a processed (intensity domain) signal 345. Finally, the region cluster is “unpacked” 350 and each region (patch or area) inside is restored to its original location from the processed (intensity domain) signal 345 to obtain an unpacked region cluster 355. Looping is performed over all processing pixel locations and, because of the overlapping of the patches, each pixel can have multiple estimates. Then, these multiple estimates can be fused (combined, sometimes using a weighting algorithm) to obtain the final denoised pixel. For the best denoising effect, the decision of the threshold is very important.
Inspired by the sparsity-based denoising techniques, a nonlinear in-loop filter has been proposed in the literature for compression de-artifacting. This technique uses a set of denoised estimates provided by an over-complete set of transforms. Specifically, the implementation of the second prior art approach generates an over-complete set of transforms by using all possible translations Hi of a given 2D orthonormal transform H, such as wavelets or DCT. Thus, given an image I, a series of different transformed versions Yi of the image I is created by applying the various transforms Hi. Each transformed version Yi is then subject to a denoising procedure, typically including a thresholding operation, to produce the series of Y′i. The transformed and thresholded coefficients Y′i are then inverse transformed back into the spatial domain, giving rise to the denoised estimates I′i. In over-complete settings, it is expected that some of the denoised estimates will provide better performance than others and that the final filtered version I′ will benefit from a combination via averaging of such denoised estimates. The denoising filter of the second prior art approach uses the weighted average of denoised estimates I′i where the weights are optimized to emphasize the best denoised estimates. To handle de-artifacting more efficiently and remove the constraint posed by the second prior art approach, a fifth prior art approach proposes exploiting different sub-lattice samplings of the picture to be filtered in order to extend the directions of analysis beyond vertical and horizontal components. Furthermore, the direction-adaptive de-artifacting filter excludes from the weighted combination the denoised estimates originating from transforms which are similar or closely aligned to the transforms used in coding residue.
For de-artifacting, the choice of filtering threshold is of great importance. The applied threshold plays a crucial part in controlling the denoising capacity of the filter as well as in computing the averaging weights used in emphasizing the better denoising estimates. Inadequate threshold selection may result in over-smoothed reconstructed pictures or may allow the persistence of artifacts. The method proposed in a sixth prior art approach improves performance over the fifth prior art approach by adaptively selecting filtering thresholds consistent with quantization noise statistics, local encoding conditions, compression requirements and the original signal. Thresholds are both spatially and temporally adapted to optimize video quality and/or coding cost. In particular, a filtering map is created to handle different threshold classes. Selected thresholds per class are encoded and transmitted as side information to the decoder. However, in practice, the optimal selection of such threshold is not easy. For example, in the sixth prior art approach, an exhaustive search is used to find the best threshold which provides the highest peak signal-to-noise ratio (PSNR).
Finally, spatial Wiener filtering is used in video compression to improve coding quality. For example, in a seventh prior art approach, Wiener filters, which are trained at the encoder based on local spatial variances, are used as a post-filter to remove quantization noise. However, in the seventh prior art approach, the filter coefficients are explicitly sent as overhead in the bitstream, thus increasing transmission overhead.