In image and video manipulations, apart from on-line and off-line visual quality evaluation, how to gauge distortion also plays a determinative role in shaping most algorithms, such as enhancement, reconstruction, data hiding, compression, and joint source/channel coding. Visual quality control within an encoder and distortion assessment for the decoded signal are particularly of interest due to the widespread applications of H.26x/MPEG-x compression and coding. Since human eyes are the end receiver of most decoded images and video, it is desirable to develop visual quality metrics that correlate better with human eyes' perception than the conventional pixel-wise error (e.g., mean-squared-error (MSE), peak signal-to-noise ratio (PSNR)) measures.
Perceptual models based upon human vision characteristics have been proposed. In one such metric proposal the colour-transformed original and decoded sequences are subjected to blocking and Discrete Cosine Transform (DCT), and the resultant DCT coefficients are then converted to the local contrast, which is defined as the ratio of the AC amplitude to the temporally low-pass filtered DC amplitude. A temporal recursive discrete second-order IIR filtering operation follows to implement the temporal part of the contrast sensitivity function (CSF). The results are then converted to measures of visibility by dividing each coefficient by its respective visual spatial threshold. The difference of two sequences is subjected to a contrast masking operation, and finally the masked difference is pooled over various dimensions to illustrate perceptual error.
With the same paradigm, another approach termed Winkler's metric consists of colour conversion, temporal filters, spatial subband filters, contrast control, and pooling for various channels, which are based on the spatio-temporal mechanisms in the human visual system. The difference of original and decoded video is evaluated to give an estimate of visual quality of the decoded signal. The metric's parameters are determined by fitting the metric's output to the experimental data on human eyes.
Prevalent visual coding schemes (e.g., DCT- or wavelet-based) introduce specific types of artefacts such as blockiness, ringing and blurring. The metrics in such coding may evaluate blocking artefacts as the distortion measure. Other metrics measure five types of error (i.e., low-pass filtered error, Weber's law and CSF corrected error, blocking error, correlated error, and high contrast transitional error), and use Principal Component Analysis to decide the compound effect on visual quality.
Switching between a perceptual model and a blockiness detector depending on the video under test has also been suggested.
Another proposed perceptual distortion metric architecture consists of opponent colour conversion, perceptual decomposition, masking, followed by pooling. In his method, the spatial frequency and orientation-selective filtering and temporal filtering are performed in the frequency (spectral) domain. The behaviour of the human vision system is modelled by cascading a 3-D filter bank and the non-linear transducer that models masking. The filter bank used in one proposed model is separable in spatial and temporal directions. The model features 17 Gabor spatial filters and 2 temporal filters. A non-linear transducer modelling of masking has been utilized. In a simplified version, the perceptual model is applied to blockiness dominant regions.
A software tool for measuring the perceptual quality of digital still images has been provided in the market. Five proprietary full reference perceptual metrics, namely blockiness, blurriness, noise, colourfulness and a mean opinion score have been developed. However, since these methods are proprietary, there are no descriptions available of how these metrics' outputs are being calculated.
A full reference video quality metric has also been proposed. For each frame, corresponding local areas are extracted from both the original and test video sequences respectively. For each selected local area, statistical features such as mean and variance are calculated and used to classify the local area into smooth, edge, or texture region. Next a local correlation quality index value is calculated and these local measures are averaged to give a quality value of the entire frame. The frame quality value is adjusted by two factors: the blockiness factor and motion factor. The blockiness measurement is evaluated in the power spectrum of the image signal. This blockiness measure is used to adjust the overall quality value only if the frame has relatively high quality index value but severe blockiness. The motion measurement is obtained by a simple block-based motion estimation algorithm. This motion adjustment is applied only if a frame simultaneously satisfies the conditions of low quality index value, high blurriness and low blockiness. Finally, all frame quality index values are averaged to a single overall quality value of the test sequence.