Assessment of quality of images is important in the context of various domains including image compression and transcoding. Transcoding of images is becoming increasingly important as rich multimedia content comprising text, voice, still and animated graphics, photos, video clips, is being delivered in heterogeneous networks composed of mobile terminals, cell phones, computers and other electronic devices. Image quality can be assessed by measuring similarity between an original image and an image obtained after image processing. Such an assessment of quality can be used to determine the effectiveness of an image processing technique.
A full-reference (FR) quality assessment of images generally involves two categories of approach: bottom-up and top-down as described by Z. Wang and A. C. Bovik in “Modern Image Quality Assessment”, Morgan & Claypool, United States, 2006. In the bottom-up approaches, scores for quality of images are best estimated by quantifying the visibility of errors. These prior art methods have several important limitations, which are described by Z. Wang and A. C. Bovik in “Modern Image Quality Assessment”, Morgan & Claypool, United States, 2006. In the top-down approaches, the whole Human Visual System (HVS) is considered as a black box, and the hypothesized functionality of the overall HVS is simulated rather than mathematically modeled. In a typical mathematical model each functional perceptual component needs to be modeled individually, and all component models, which serve as basic building blocks, are integrated into an overall system model.
One of the main methods in the top-down category described in the literature is the Structural SIMilarity (SSIMW&B) index, which gives an accurate score for image quality with acceptable computational complexity in comparison to other quality metrics, described by H. R. Sheikh, M. F. Sabir, and A. C. Bovik, in “A Statistical Evaluation of Recent Full Reference Image Quality Assessment Algorithms,” IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3441-3452, November 2006. SSIMW&B has attracted a great deal of attention in recent years, and has been considered for a wide range of applications. The idea underlying the SSIMW&B is that the HVS adapts to structural information extracted from visual scenes, and, therefore, a measurement of structural similarity (or distortion) should provide a good approximation of image quality. Some approaches have tried to improve the SSIM index. The Multi-scale SSIM described by Z. Wang, E. P. Simoncelli, and A. C. Bovik, in “Multi-Scale Structural Similarity for Image Quality Assessment,” 37th IEEE Asilomar Conference on Signals, Systems and Computers, pp. 1398-1402, November 2003, attempts to increase the accuracy of SSIM assessment by incorporating image details at different resolutions in pixel domain. In the paper by D. M. Rouse, and S. S. Hemami, “Understanding and Simplifying the Structural Similarity Metric,” IEEE International Conference on Image Processing, San Diego, pp. 1188-1191, October 2008, the authors investigate ways to simplify the computation of SSIMW&B index in the pixel domain. A method to compute it using subbands at different levels in the discrete wavelet domain is proposed by C-L. Yang, W-R. Gao, and L-M. Po, in “Discrete Wavelet Transform-based Structural Similarity for Image Quality Assessment,” IEEE International Conference on Image Processing, San Diego, pp. 377-380, October 2008. Five-level wavelet decomposition using the Daubechies 9/7 filter is applied to both original and distorted images, and then the SSIMW&B index is computed between corresponding subbands. Finally, the similarity score is obtained by computing the weighted mean of all SSIM indices. To determine the weights, a large number of experiments need to be performed however, for measuring the sensitivity of the human eye to different frequency bands.
Before explaining the present invention, a brief discussion of the wavelet transformation technique used for multiresolution decomposition of images is presented first. Two dimensional discrete wavelet transform (DWT) is applied to a digital image with discrete values for separating low frequency content of images from its high frequency content. Coefficients obtained after applying a discrete wavelet transform make up a discrete wavelet domain. For extracting these contents DWT exploits two types of filters: a low-pass filter and a high pass-filter. In one level DWT, the discrete wavelet transform is applied only once to an image. In two dimensions, the one level DWT is typically obtained by applying separable one dimensional (1D) filters (one low-pass filter and one high-pass filter) horizontally and vertically. The various combinations of horizontal and vertical applications of the low-pass and the high-pass filters lead to four different combinations of a resulting image. Therefore, when DWT is applied to an image for one level decomposition, four subbands (images) are obtained: one approximation subband and three detail subbands including a horizontal subband, a vertical subband, and a diagonal subband as shown in FIG. 1.
Block diagram 100 presented in FIG. 1 shows one level multiresolution decomposition using discrete wavelet transform of an image 102 according to the prior art. The decomposition results in four subbands: an approximation subband 104, a horizontal subband 106, a vertical subband 108 and a diagonal subband 110. Each of the subbands is of a quarter size or resolution of the image 102. The approximation subband 104 contains main content (low frequency content) of the image 102. The detail subbands include fine edges and textures of the image 102. For example, the horizontal subband 106 contains horizontal edges of the image 102. The vertical subband 108 and the diagonal subband 110 are used in the same way as the horizontal subband 106 and form the vertical and diagonal edges of the image 102 respectively. FIG. 2 presents a diagram 200 displaying the result of applying the one level DWT decomposition to a sample Image Lena 202 resulting in four subbands: an approximation subband LenaA 204, a horizontal detail subband LenaH 206, a vertical detail subband LenaV 208 and a diagonal subband LenaD 210. As discussed earlier, LenaA 204 contains the main contents whereas the fine edges are captured in the three detail subbands: LenaH 206, LenaV 208 and LenaD 210.
Although assessment of image quality has received considerable attention from researchers, the existing prior art methods have numerous shortcomings that include the following.
First, a SSIM map based method described by Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, in “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, April 2004, computes local statistics within a local square window in the pixel domain, even though the statistics of blocks in the wavelet domain are more accurate. The SSIM map gives the visual quality/distortion within each local window.
Second, multi-scale and multi-level SSIMs discussed by C-L. Yang, W-R. Gao, and L-M. Po, in “Discrete Wavelet Transform-based Structural Similarity for Image Quality Assessment,” IEEE International Conference on Image Processing, San Diego, pp. 377-380, October 2008, for determining the sensitivity of the HVS to different subbands requires many experiments. Moreover, if a wavelet or filter is changed, the computed weights and parameters are no longer optimum and may not even be valid.
Third, the five-level decomposition of images, as in the paper by Yang et al. mentioned in the previous paragraph, would make the size of the approximation subband very small; so it would no longer be useful in the effective extraction of image statistics.
Fourth, prior art methods use the mean of the SSIM maps to generate the score for image quality. However, distortions in various image areas have different impacts on the HVS.
Therefore there is a need in the industry for developing an improved method, which would accurately assess the quality of an image and would have a low complexity in order to be applied in real-time applications. Such a method would need to avoid or mitigate the above-mentioned drawbacks of the prior art.