With the development and popularity of digital imaging devices and communication technologies, digital images have become increasingly important for information representation and communication. During the life cycle of a digital image, it can be degraded at various stages and such quality degradation may lead to failures in applications at subsequent stages. It is therefore important to maintain and monitor image quality in numerous image and video processing systems, a primary goal of such Image Quality Assessment (IQA) being a prediction of visual quality as perceived by a human viewer. Image quality measures can be used to assess the dependence of perceived distortion as a function of parameters such as transmission rate and also for selecting the optimal parameters of image enhancement methods. Although subjective tests may be carried out in laboratory settings to perform IQA, such tests are expensive and time-consuming, and cannot be used in real-time and automated systems. Therefore, the possibility of developing objective IQA metrics to measure image quality automatically and efficiently is of great interest.
Full-Reference IQA (FR-IQA) models utilize information from both the distorted image and a corresponding pristine reference image for estimating visual quality. Conventional FR-IQA metrics such as the Mean Squared Error (MSE) or Peak Signal-to-Noise Ratio (PSNR) directly measure the pixel-by-pixel differences between the distorted and the reference images in the spatial domain. These types of metrics measure signal fidelity but often have poor correlation with human perception, especially when the noise is not additive.
Two types of approaches have been taken towards developing perceptual visual quality metrics (PVQMs) that align better with human perceptions: bottom-up and top-down approaches. The bottom-up approaches attempt to model various processing stages in the visual pathway of the human visual system (HVS) by simulating relevant psychophysical and physiological properties including contrast sensitivity, luminance adaption, various masking effects and so on. However, given our limited knowledge of these properties and their combined influence on final perception, the HVS is too complicated to be modeled accurately in this way.
More recent research efforts have been directed to top-down frameworks, which model the input-output relationship by incorporating knowledge from various sources such as the statistical properties of natural images, and data on the way image distortions seem to be handled by the HVS. Most state-of-the-art FR-IQA methods fall into this category, and some, such as the Structural SIMilarity (SSIM) index and its variants (including the Multi-Scale SSIM (MS-SSIM) and the Information Weighted SSIM (IW-SSIM)), the Feature SIMilarity (FSIM) index and the Gradient Magnitude Similarity Deviation (GMSD), have had a measure of success, suggesting that low-level visual features such as mean intensity, standard deviation of intensity, phase congruency and gradient magnitude are effective quality indicators. However, these low-level cues may not work uniformly well across different distortion categories. As a result, the performance of corresponding FR measures may vary a lot across different types of distortions.
There is therefore a need for improved methods of assessing image quality that align well with human perception across different types of distortions but are also objective, driven by measurable data, and efficient. Some efforts have been made towards applying learning-based approaches employing convolutional neural networks (ConvNet), but these have been limited to situations where reference images are not available for quality estimation, i.e. for No-Reference IQA (NR-IQA). There therefore remains a need to explore and develop the application of such methods to FR-IQA, where corresponding pairs of reference and distorted images are available for analysis.