The block-based discrete transform is a fundamental component of many image and video compression standards including, for example, the joint photographic experts group (JPEG) Standard, the International Telecommunication Union, Telecommunication Sector (ITU-T) H.263 Recommendation (hereinafter the “H.263 Recommendation”), the International Organization for Standardization/international Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-1 (MPEG-1) Standard, ISO/IEC MPEG-2 Standard, and the ISO/IEC MPEG-4 Part 10 Advanced Video Coding (AVC) Standard/ITU-T H.264 Recommendation (hereinafter the “MPEG-4 AVC Standard”). Further, the block-based discrete transform is used in a wide range of applications.
The discrete cosine transform (DCT) is the most extensively used block transform. The DCT scheme takes advantage of the local spatial correlation property of the image/frame by dividing the image/frame into blocks of pixels (usually 4×4 and 8×8), transforming each block from the spatial domain to the frequency domain using the discrete cosine transform and quantizing the DCT coefficients. Since blocks of pixels are treated as single entities and coded separately, correlation among spatially adjacent blocks is not taken into account in coding, which results in block boundaries being visible when the decoded image is reconstructed. For example, a smooth change of luminance across a border can result in a step in the decoded frame if neighboring samples fall into different quantization intervals. Such so-called “blocking” artifacts are often very disturbing, especially when the transform coefficients are subject to coarse quantization.
The detection of block artifacts allows for the improvement of video visual quality. Frame regions with blocking artifacts can be targeted to alleviate the problem. This can be done in a video encoder setting, for example, in accordance with the MPEG-4 AVC Standard.
Turning to FIG. 1, a video encoder capable of performing video encoding in accordance with the MPEG-4 AVC standard is indicated generally by the reference numeral 100.
The video encoder 100 includes a frame ordering buffer 110 having an output in signal communication with a non-inverting input of a combiner 185. An output of the combiner 185 is connected in signal communication with a first input of a transformer and quantizer 125. An output of the transformer and quantizer 125 is connected in signal communication with a first input of an entropy coder 145 and a first input of an inverse transformer and inverse quantizer 150. An output of the entropy coder 145 is connected in signal communication with a first non-inverting input of a combiner 190. An output of the combiner 190 is connected in signal communication with a first input of an output buffer 135.
A first output of an encoder controller 105 is connected in signal communication with a second input of the frame ordering buffer 110, a second input of the inverse transformer and inverse quantizer 150, an input of a picture-type decision module 115, a first input of a macroblock-type (MB-type) decision module 120, a second input of an intra prediction module 160, a second input of a deblocking filter 165, a first input of a motion compensator 170, a first input of a motion estimator 175, and a second input of a reference picture buffer 180.
A second output of the encoder controller 105 is connected in signal communication with a first input of a Supplemental Enhancement Information (SEI) inserter 130, a second input of the transformer and quantizer 125, a second input of the entropy coder 145, a second input of the output buffer 135, and an input of the Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) inserter 140.
An output of the SEI inserter 130 is connected in signal communication with a second non-inverting input of the combiner 190.
A first output of the picture-type decision module 115 is connected in signal communication with a third input of a frame ordering buffer 110. A second output of the picture-type decision module 115 is connected in signal communication with a second input of a macroblock-type decision module 120.
An output of the Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) inserter 140 is connected in signal communication with a third non-inverting input of the combiner 190.
An output of the inverse quantizer and inverse transformer 150 is connected in signal communication with a first non-inverting input of a combiner 119. An output of the combiner 119 is connected in signal communication with a first input of the intra prediction module 160 and a first input of the deblocking filter 165. An output of the deblocking filter 165 is connected in signal communication with a first input of a reference picture buffer 180. An output of the reference picture buffer 180 is connected in signal communication with a second input of the motion estimator 175 and a third input of the motion compensator 170. A first output of the motion estimator 175 is connected in signal communication with a second input of the motion compensator 170. A second output of the motion estimator 175 is connected in signal communication with a third input of the entropy coder 145.
An output of the motion compensator 170 is connected in signal communication with a first input of a switch 197. An output of the intra prediction module 160 is connected in signal communication with a second input of the switch 197. An output of the macroblock-type decision module 120 is connected in signal communication with a third input of the switch 197. The third input of the switch 197 determines whether or not the “data” input of the switch (as compared to the control input, i.e., the third input) is to be provided by the motion compensator 170 or the intra prediction module 160. The output of the switch 197 is connected in signal communication with a second non-inverting input of the combiner 119 and an inverting input of the combiner 185.
A first input of the frame ordering buffer 110 and an input of the encoder controller 105 are available as input of the encoder 100, for receiving an input picture. Moreover, a second input of the Supplemental Enhancement Information (SEI) inserter 130 is available as an input of the encoder 100, for receiving metadata. An output of the output buffer 135 is available as an output of the encoder 100, for outputting a bitstream.
In an environment such as that corresponding to FIG. 1, the blocks having blocking artifacts can be filtered with a deblocking filter, or re-encoded with different coding parameters, or coefficients can be quantized with finer quantization steps, and so forth.
Objective quality assessment is directed to automatically predicting perceived image or video quality without the use of a human observer. Objective quality assessment methods can be classified into three broad categories, namely reduced-reference methods, no-reference methods, and full-reference methods.
Reduced-reference methods have access to partial information regarding the “perfect version” or “original version”. The partial information can be made available to the quality assessment algorithm through a side-channel (named as an RR channel).
No-reference methods have access to only the distorted signal and must estimate the quality of the signal without any knowledge of the “original version”.
Full-reference methods have access to an “original version” of the image or video against which a respective full-reference method can compare a “distorted version”.
Blocking artifact detection (BAD) is computed through a blocking artifact metric, which is an objective quality assessment. The surveyed BAD algorithms are mostly of the type no-reference methods.
[1] MSDS (Mean square difference of slope): In short, MSDS measures the slope between the pixel values in neighboring blocks. Turning to FIG. 2, an exemplary one-dimensional visualization of block boundary obtained using the mean square difference of slope method of blocking artifact detection is indicated generally by the reference numeral 200. The one-dimensional visualization 200 is shown on a graph with the vertical axis thereof corresponding to intensity value and the horizontal axis thereof corresponding to the horizontal location of a particular macroblock on the kth horizontal line in an input image. With respect to the one-dimensional visualization 200, let N−2, N−1, 0, 1, be the pixel locations between 2 adjacent blocks. Let s1 be defined as the slope between the 0th and (N−1)th pixel and let s2 be the sum of the slopes between (N−1) and (N−2) and 0 and 1. Then, the MSDS for one direction is defined as (s1−s2)/2. The overall MSDS is found by summing individual MSDS over all 4 neighbors.
[2] PIQE-B (Psychovisual image quality evaluator, a DC coefficient of DCT-based metric): Blockiness index (B) is found as follows:                Take the difference of the DC coefficient of the center macroblock with its 8-neighbors.        Sum the difference over 8-neighbors. Let this sum be s_p.        Do the same with the original frame and obtain the sum as s_or.        The blockiness index is the sum of the absolute difference of s_p-s_or over all the blocks in the image.        
[3] GBIM (Generalized Block-edge impairment metric): This metric also uses the differences between the pixel values at the block boundaries. However some type of human visual principles (HVS) principles is also included in the metric. The pixel differences are weighted by a weighting matrix whose values depend on spatial characteristics of the image. In general, more weights are given to those boundaries which are more visible. The luminance masking effect is used in determining weights. The noise in extreme bright areas as well as extreme dark areas is less visible compared to luminance values between 70 and 90 (in an 8 bit gray scale image).
[4] FLATS: This metric works on the luminance values of an image. Each 8×8 block is classified based on the homogeneity of the pixel values. If luminance is constant in the entire block, i.e., if luminance is constant in the horizontal direction and is also constant in the vertical direction, then these blocks are chosen for further consideration. Then, the average luminance of North, South, East, and West blocks are compared to the average luminance value of the center block. If the minimum of the difference of (N, E, S, or W) divided by the average luminance of the pixels within a 24×24 block is greater than a given threshold, then the block is considered flat.
These and other blocking artifact detection (BAD) methods can be found in the prior art, but they are of the previously described “no-reference” type.