Assessment of quality of images is important in various contexts including image transcoding and compression. Transcoding of images is crucial in the dissemination of rich multimedia content comprising text, voice, still and animated graphics, photos, video clips, in heterogeneous networks composed of mobile terminals, cell phones, computers and other electronic devices. Image quality can be assessed by measuring similarity between an original image (often referred to as a reference image) and an image obtained after image processing (often referred to as a distorted image). Such an assessment of quality can be used for example to determine the effectiveness of an image processing technique.
Image quality assessment has an important role in developing various image and video processing applications including multimedia applications to provide acceptable outputs for the end-user, the human clients. Image quality is best evaluated subjectively by human viewers. However, subjective assessment of quality is time consuming, expensive, and cannot be done for real time applications. Thus, it is necessary to define an objective criterion that can measure the difference between the undistorted original image and the distorted image signals. Ideally, such an objective measure should correlate well with the perceived difference between two image signals and can vary linearly with the subjective quality. Subjective image quality is concerned with how processed images are perceived by a human viewer and designates his opinion of quality.
Objective methods are usually classified based on the availability of the reference images. If the reference image is available, the measure of quality or the quality metric is considered as a full-reference (FR) assessment measure. The peak signal-to-noise ratio (PSNR) is the oldest and most widely used FR assessment measure and has a number of attractive characteristics. It is simple, has clear physical meaning, is parameter free, and performs superbly in various optimization contexts as described by Z. Wang, and A. C. Bovik, in Mean squared error: Love it or leave it? A new look at signal fidelity measures, IEEE Signal Processing Mag., vol. 26, no. 1, pp. 98-117, January 2009. This measure is defined as:
            PSNR      ⁡              (                  X          ,          Y                )              =          10      ·                        log          10                (                              X            max            2                                MSE            ⁡                          (                              X                ,                Y                            )                                      )                        MSE      ⁡              (                  X          ,          Y                )              =                  1                  N          P                    ·                        ∑                      m            ,            n                          ⁢                              (                                          X                ⁡                                  (                                      m                    ,                    n                                    )                                            -                              Y                ⁡                                  (                                      m                    ,                    n                                    )                                                      )                    2                    where X and Y denote the reference (original) and distorted images respectively, Xmax is the maximum possible pixel value of the reference image X (the minimum pixel value is assumed to be zero), and NP is the number of pixels in each of the images.
The conventional PSNR, however, cannot sufficiently reflect the human perception of image fidelity, that is, a large PSNR gain may result in a small improvement in visual quality. Thus, a number of other quality measures have been developed by researches. Generally speaking, the FR assessment of image signals involves two types of approaches: a bottom-up approach and a top-down approach that are discussed in Z. Wang, and A. C. Bovik, in “Modern Image Quality Assessment”, USA: Morgan & Claypool, 2006.
In the bottom-up approach, the perceptual measures of quality are best estimated by quantifying the visibility of errors. In order to quantize errors according to human visual system (HVS) features, techniques in this category try to model the functional properties of different stages of the HVS as characterized by both psychophysical and physiological experiments. This is usually accomplished in several stages of preprocessing, frequency analysis, contrast sensitivity analysis, luminance masking, contrast masking, and error pooling as described by Z. Wang, and A. C. Bovik, in “Modern Image Quality Assessment”, USA: Morgan & Claypool, 2006 and by A. C. Bovik, in “The Essential Guide to Image Processing”, USA: Academic Press, 2009, ch. 21. Most of HVS-based quality assessment techniques use multi-channel models which assume that each band of spatial frequencies is handled by an independent channel. With the visible difference predictor (VDP) model, discussed by S. Daly, in “The visible difference predictor: An algorithm for the assessment of image fidelity”, Proc. SPIE, vol. 1616, February 1992, pp. 2-15, the image is decomposed into five spatial levels followed by six orientation levels using a variation of Watson's Cortex transform. Then, a threshold map is computed from the contrast in that channel. In Lubin's model, which is also known as the Sarnoff visual discrimination model (VDM) presented in “A visual discrimination mode for image system design and evaluation”, Visual Models for Target Detection and Recognition, Singapore: World Scientific, 1995, pp. 207-220, the images are decomposed into seven resolutions after low-pass filtering and resampling. P. C Teo and D. J. Heeger uses the steerable pyramid transform to decompose the image into several spatial frequency levels within which each level is further divided into a set of (six) orientation bands. Their approach is described in “Perceptual Image distortion”, in Proc. IEEE. Int. Conf. Image Processing, vol. 2, November 1994, pp. 982-986. VSNR, discussed by D. M. Chandler, and S. S. Hemami, in “A wavelet-based visual signal-to-noise ratio for natural images”, IEEE Transactions on Image Processing, vol. 16, no. 9, pp. 2284-2298, September 2007, is another advanced HVS-based metric that after image preprocessing decomposes both the reference image and errors between the reference and distorted images into five levels by using a discrete wavelet transform and 9/7 biorthogonal filters. Then, it computes the contrast detection threshold to assess the ability to detect the distortions for each subband produced by the wavelet decomposition. Other known methods based on the bottom-up approach which exploit Fourier transform rather than multiresolution decomposition include Weighted Signal to Noise Ratio (WSNR), discussed by N. Damera-Venkata, T. D. Kite, W. S. Geisler, B. L. Evans, and A. C. Bovik, in “Image quality assessment based on a degradation model”, IEEE Transactions on Image Processing, vol. 9, no. 4, pp. 636-650, April 2000 and Picture Quality Scale (PQS) described by M. Miyahara, K. Kotani, and V. R. Algazi, in “Objective Picture Quality Scale (PQS) for image coding”, IEEE Transactions on Communication, vol. 46, no. 9, pp. 1215-1225, September 1998. Methods based on the bottom-up approach have several important limitations, which are discussed by Z. Wang, and A. C. Bovik, in “Modern Image Quality Assessment”, USA: Morgan & Claypool, 2006 and by Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, in “Image quality assessment: From error visibility to structural similarity”, IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, April 2004. Moreover, the error-based techniques, such as WSNR, Noise Quality Measure (NQM), described by N. Damera-Venkata, T. D. Kite, W. S. Geisler, B. L. Evans, and A. C. Bovik, in “Image quality assessment based on a degradation model”, IEEE Transactions on Image Processing, vol. 9, no. 4, pp. 636-650, April 2000 and VSNR discussed by D. M. Chandler, and S. S. Hemami in “VSNR: A wavelet-based visual signal-to-noise ratio for natural images”, IEEE Transactions on Image Processing, vol. 16, no, 9, pp. 2284-2298, September 2007 are less simple to use, as they require sophisticated procedures to compute the human visual system (HVS) parameters.
With the techniques based on the top-down approach, the overall functionality of the HVS is considered as a black box, and the input/output relationship is the focus of attention. Thus, techniques following the top-down approach do not require any calibration parameters from the HVS or viewing configuration. Two main strategies in this category use a structural approach and an information-theoretic approach.
The most important method using the structural approach is the Structural Similarity (SSIM) index described by Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, in “Image quality assessment: from error visibility to structural similarity”, IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, April 2004. As discussed by H. R. Sheikh, M. F. Sabir, and A. C. Bovik, in “A statistical evaluation of recent full reference image quality assessment algorithms”, IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3440-3451, November 2006, SSIM gives an accurate score with acceptable computational complexity compared to other measures of quality. SSIM has attracted a great deal of attention in recent years, and has been considered for a range of applications. As described by Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, in “Image quality assessment: from error visibility to structural similarity”, IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, April 2004, the principal idea underlying the SSIM approach is that the HVS is highly adapted to extract structural information from visual scenes, and, therefore, a measurement of structural similarity (or distortion) should provide a good approximation of perceptual image quality. Some approaches have tried to improve the SSIM index. The multi-scale SSIM discussed by Z. Wang, E. P. Simoncelli, and A. C. Bovik, in “Multi-scale structural similarity for image quality assessment”, Proc. IEEE Asilomar Conf. Signals, Systems, Computers, vol. 2, November 2003, pp. 1398-1402, attempts to increase SSIM assessment accuracy by incorporating image details at five different resolutions by applying successive low-pass filtering and downsampling. In “Understanding and simplifying the structural similarity metric”, Proc. IEEE International Conference on Image Processing, San Diego, October 2008, pp. 1188-1191, D. M. Rouse, and S. S. Hemami investing” Discrete wavelet transform-based structural similarity for image quality assessment”, Proc. IEEE International Conference on Image Processing, San Diego, October 2008, pp. 377-380, propose to compute it in the discrete wavelet domain using subbands at different levels. Five-level decomposition using the Daubechies 9/7 wavelet is applied to both original and distorted images, and then SSIM is computed between corresponding subbands. Finally, the similarity measure is obtained by computing a weighted mean of all SSIMs. To determine the weights, a large number of experiments have been performed for measuring the sensitivity of the human eye to different frequency bands. Z. Wang, E. P. Simoncelli, in “Translation insensitive image similarity in complex wavelet domain”, Proc. IEEE International Conference on Acoustics, Speech, Signal Processing, vol. 2, March 2005, pp. 573-576 and M. P. Sampat, Z. Wang, S. Gupta, A. C. Bovik, and M. K. Markey, in “Complex wavelet structural similarity: A new image similarity index”, IEEE Transactions on Image Processing, vol. 18, no. 11, pp. 2385-2401, November 2009 discuss Complex Wavelet Structural Similarity (CW-SSIM), which benefits from a complex version of a 6-scale, 16-orientation steerable pyramid decomposition characteristic and propose a measure resistant to small geometrical distortions.
With the information-theoretic approach, visual quality assessment is viewed as an information fidelity problem. An information fidelity criterion (IFC) for image quality measurement that is based on natural scene statistics is presented by H. R. Sheikh, A. C. Bovik, and G. de Veciana, in “An information fidelity criterion for image quality assessment using natural scene statistics”, IEEE Transactions on Image Processing, vol. 14, no. 12, pp. 2117-2128, December 2005. In the IFC, the image source is modeled by using a Gaussian scale mixture (GSM) while the image distortion process is modeled as an error-prone communication channel. The information shared between the images being compared is quantified by using the mutual information that is a statistical measure of information fidelity. Another information-theoretic quality metric is the “Visual Information Fidelity (VIF) index” discussed by H. R. Sheikh, and A. C. Bovik, in “Image information and visual quality”, IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 430-444, February 2006. The computation of the VIF index follows the same procedure as the IFC, except that, in the determination of the VIF index both the image distortion process and the visual perception process are modeled as error-prone communication channels. For the VIF measure, the HVS distortion channel is modeled with an additive white Gaussian noise. The VIF index is the most accurate image measure of quality according to the performance evaluation of prominent image quality assessment algorithms presented by H. R. Sheikh, M. F. Sabir, and A. C. Bovik, in “A statistical evaluation of recent full reference image quality assessment algorithms”, IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3440-3451, November 2006.
Review of existing literature reveals a number of shortcomings of the prior art methods. The limitations of these prior art methods include the following.
First, computational complexity of the existing assessment techniques for accurately determining the measures of quality are very high. Some image/video processing applications, like identifying the best quantization parameters for each frame in video encoding described by S. L. P. Yasakethu, W. A. C. Fernando, S. Adedoyin, and A. Kondoz in a paper entitled “A rate control technique for off line H.264/AVC video coding using subjective quality of video”, IEEE Transactions on Consumer Electronics, vol. 54, no. 3, pp. 1465-1472, August 2008, could be used more efficiently if an accurate low-complexity technique for determining the measure of quality (quality metric) were used.
Second, the bottom-up approach used by prior art methods requires that the associated techniques apply a multiresolution transform, decompose the input image into a large number of resolutions (five or more). While, the HVS is a complex system which is not completely known to us, combining the different bands into a final metric is difficult. In similar top-down methods such as multi-scale and multi-level SSIMs discussed by Z. Wang, E. P. Simoncelli, and A. C. Bovik, in “Multi-scale structural similarity for image quality assessment”, Proc. IEEE Asilomar Conf. Signals, Systems, Computers, vol. 2, November 2003, pp. 1398-1402 and by C.-L. Yang, W.-R. Gao, cited earlier and L.-M. Po, in “Discrete wavelet transform-based structural similarity for image quality assessment”, Proc. IEEE Int. Conf. Image Processing, San Diego, October 2008, pp. 377-380, cited earlier, determining the sensitivity of the HVS to different scales or subbands requires many experiments. Moreover, if the wavelet or filter is changed, the computed weights and parameters are no longer optimum and may not even be valid.
Third, top-down methods, such as SSIM, gather local statistics within a square sliding window and may not always be very accurate.
Fourth, the large number of decomposition levels, as discussed by C.-L. Yang, W.-R. Gao, and L.-M. Po, in “Discrete wavelet transform-based structural similarity for image quality assessment”, Proc. IEEE Int. Conf. Image Processing, San Diego, October 2008, pp. 377-380, cited in the previous paragraph would make the size of the approximation subband that has the main image contents very small, and it would no longer be able to help in the effective extraction of image statistics.
Fifth, previous SSIM methods use the mean of the SSIM quality map to determine the measure of quality for the image (or the overall image quality score). However, distortions in various image areas have different impacts on the HVS.
Thus, there is a further need for the development of an improved measure of quality for images, which would avoid or mitigate the disadvantages of the prior art.