The following relates to a method and system for identifying image impairments and more specifically to a method and system for identifying image impairments caused by interpolation in an image interpolated from two or more adjacent images thereto.
The process of interpolating between adjacent images in a sequence of images to generate intermediate image representations is a well-established technique, which is commonly used in broadcast standards conversion and frame-rate up-conversion in modern television displays.
There are numerous approaches to the exact method of interpolation, but methods generally fall into two broad categories: motion compensated and non-motion compensated. These categories each have advantages and disadvantages, and entail different trade-offs between the perceived visual quality of the output and the complexity of the interpolation algorithm. Although there are a range of performance characteristics, it is widely appreciated that motion compensated conversion usually offers superior interpolation quality to non-motion compensated conversion.
It is also widely appreciated that both categories of interpolation algorithm can introduce interpolation artefacts such as image distortions that exist in the output sequence that are not present in the input sequence, and these impair the perceived quality of the output.
For example, non-motion compensated conversion commonly entails forming an output image from a weighted combination of the nearest or adjacent input frames, and this leads to ‘double imaging’ artefacts that are apparent when anything within the image moves. Motion compensated conversion requires a step to estimate the motion vectors between adjacent input frames and then uses these to interpolate moving objects faithfully at the appropriate intermediate position, and any errors in the identification of the correct vectors manifest in incorrect motion portrayal of objects or portions of objects. This can produce visually distracting rips and tears in the output images that impact the perceived quality significantly.
FIG. 1 illustrates an example of non-motion compensated interpolation between two temporally sequential input images or frames 101 and 102 to form an output image 103 at some point between the two. The input images contain a moving object, identified at 104 in input image 101 which moves to a new image location 105 in input frame 102. The interpolated output image 103 is formed by a normalised mix of the two input images 101 and 102 where the proportion of the mix is linearly dependent on the temporal distance between the two input images 101 and 102. The representation of input object 104 in the output image 103 is labelled 106 and the representation of input object 105 in the output image 103 is labelled 107. Representations 106 and 107 are misaligned due to the motion of the object, thereby producing two partially superimposed views of the same object, or a ‘double image’.
FIG. 2 illustrates an example of motion compensated interpolation, between two temporally sequential input images 201 and 202 to form an output image 203 at a point between the two. The aim of motion compensated interpolation is to detect motion between input frames, to form a motion vector field, and then to interpolate objects to an appropriate image position at the desired temporal output point. In the example shown the input images contain a moving object, identified at 204 in input image 201 which moves to a new image location 205 in input frame 202. The desired path of this moving object between input frames 201 and 202 is indicated by the dashed lines 206 and 207. Representation 208 depicts the desired position of the moving object at the intermediate point in time. Motion compensated interpolation is commonly imperfect and errors in the determination of the vectors can lead to errors in the output image produced. Image element 209 indicates an artefact arising when some of the vectors are incorrect, resulting in a portion of the moving object being rendered in the wrong position. Correspondingly, image element 210 indicates an artefact in the object in the output image, due to the missing image element 209.
The appearance of conversion artefacts in images that have been interpolated is clearly undesirable. Although the visual appearance of such impairments is often readily apparent, even to non-experts, it is well known that the identification of such image impairments by automated metrics is a non-trivial problem. Some known quality measurements, for example root mean squared error, can be used if the desired output—a ‘ground truth’—exists with which to compare the interpolated output. However, a ground truth often does not exist. Where no coincident reference output or ground truth is available no direct comparison can be made and it is common practice for those skilled in image interpolation techniques simply to manipulate the interpolation algorithm parameters and use human visual observation of the output to minimise the subjective visibility of such artefacts. Clearly, using human visual observation to detect image interpolation impairments or artefacts is time consuming and requires substantial human resources.
Measurement of image quality is in general a well-developed subject area which has a research base extending back over many decades and extensive discussion in the academic literature. There are two principal categories of measurement: ‘double-ended’ measurements where an impaired image is compared to a corresponding unimpaired image, and ‘single ended’measurements which attempt to estimate the degree of impairment from just the impaired image alone. The double-ended category, is further divided into two sub categories: ‘full-reference’ measurement where the comparison is done between the full-resolution versions of the impaired and unimpaired images, and a ‘reduced reference’ measurement where a comparison is done between the impaired and unimpaired images, but involves comparing reduced resolution or comparing measured attributes of the images rather than direct comparison of the images themselves. Reduced-reference quality metrics are sometimes used in applications where there is insufficient bandwidth to transfer the entire unimpaired image to the point where the impaired image measurement takes place, but there is sufficient auxiliary or side-channel bandwidth to pass some metadata about the unimpaired image.
Quality measurement techniques can pertain to individual images, or to the comparison of pairs of individual images, or to pairs of corresponding images within two sequences of images—for example comparison of a video source video sequence with the same material after compression encoding and decoding.
Single-ended quality metrics have the advantage that they do not require access to a reference image or image sequence, but they are often specific to certain types of impairment or certain circumstances—such as the measurement of compression blockiness.
Double-ended full-reference and reduced-reference quality metrics have the constraint that some or all image information about the unimpaired source is required at the point where the measurement of the unimpaired image is made, but generally speaking they are broader in scope and more widely adopted.
In the field of image quality assessment, the most commonly encountered metrics are double-ended techniques based on measurements of pixel differences (errors) between impaired and unimpaired pictures. These frequently involve differences in image luminance components, although other components are also used. Examples include; measurement of absolute error, of mean squared error (MSE), the root mean squared error (RMS), and the Peak Signal to Noise Ratio (PSNR) defined by
      P    ⁢                  ⁢    S    ⁢                  ⁢    N    ⁢                  ⁢    R    =      10    ⁢                  ⁢          log      10        ⁢          {                        255          2                                      1                          N              ×              M                                ⁢                                    ∑                              x                =                0                                            x                =                                  N                  -                  1                                                      ⁢                                          ∑                                  y                  =                  0                                                  y                  -                  M                  -                  1                                            ⁢                                                (                                                            Y                      ref                                        -                                          Y                      test                                                        )                                2                                                        }      
PSNR measurements are used extensively throughout the image processing industry and the academic literature, and form a starting place in virtually all text books on image quality measurement (for example, “Digital Video Image Quality and Perceptual Coding”, Taylor & Francis, eds. H. R. Wu and K. R Rao, ISBN 0-8247-2777-0, pp 5-6).
A key feature of PSNR and related image distortion measurements which are based on pixel differences is the requirement for the reference and test images to be aligned, since any misalignment—or any content exhibiting movement between them—results in pixel differences which corrupt the PSNR score and render it meaningless. It becomes impossible to distinguish alignment or motion-related pixel differences from image impairment pixel differences. Thus, these methods are not suitable for detecting interpolation artefacts in an interpolated image where there is motion between the non-interpolated or adjacent images and the interpolated image, and no ground reference exists. In some situations such as compression coding, the processed image can be offset by a fixed amount from the original, and the distortion measurement must be preceded by an image registration step to realign the processed image with the original. Alignment handling in this way, is quite different from the task of dealing with motion differences between test and reference images and in the general case cannot be applied to compare temporally or spatially interpolated images with the original non-interpolated images when motion is present.
A large number of extensions to the basic idea of pixel-differences exist in the prior art, which involve spatial or temporal frequency weighting of errors with a pre-processing step to divide the test and reference images into frequency sub-bands, calculating the error between the separate corresponding sub-bands and then combining the results using different weights. The frequency sub-band weighting is commonly arranged in a manner intended to reflect the sensitivities of the human visual system to specific spatial or temporal frequency features. Examples of this are the ‘just-noticeable-distortion’ method of Chou and Li (C. H. Chou and Y. C. Li, “A perceptually tuned sub band image coder based on the measure of just-noticeable-distortion profile,” IEEE Trans. Circuits and Systems for Video Tech., vol. 5, pp. 467-476, December 1995), and the multi-resolution method of Juffs (Beggs and Deravi Juffs, E. Beggs, F. Deravi, “A Multiresolution Distance Measure for Images”, IEEE SP Letters, vol. 5, No. 6, June 1998).
The application of frequency domain techniques in such cases is used as a means to estimate the subjective significance of differences in specific image attributes of test and reference images that are aligned in the sense that there is no content motion between them. The underlying assumption with is that any differences between test and reference images are due to image distortions only, and not to motion between the frames.
It is well known that spatial translation or motion between images manifests as a fixed phase difference between the respective image spectra. This is the basis of the phase correlation motion estimation technique originally applied to television image sequences and disclosed in U.S. Pat. No. 4,890,160. This approach offers an efficient and elegant way to identify motion vector candidates that can be used as part of a frame-rate conversion algorithm. But it does not provide a measure of output picture quality, or indeed indicate where errors have occurred which produce visual impairment in the output.
A technique for estimating image quality from the power spectrum of the image is disclosed by Nill and Bouzas in N. B Nill and B. H. Bouzas, “Objective image quality measure derived from digital image power spectra”, Opt. Eng. 31(4), 813-825 (1992). This technique is a single ended technique and based on assumptions of power spectra of “typical” natural video.
Most image compression ‘blockiness’ or ‘blocking-artefact’ quality estimation algorithms rely on measuring spatial pixel differences rather than being frequency-domain based. But an algorithm by Wang, Bovik and Evans (Z. Wang, A. C. Bovik and B. L. Evans, “Blind measurement of blocking artefacts in images,” Proc. IEEE Int. Conf. Image Proc., vol. 3, pp. 981-984, September 2000) uses a power spectrum approach to detect blockiness. Their method is single-ended and involves measuring absolute energy component differences between the power spectrum, and a median-filtered version of the same power spectrum, at specific harmonic frequencies that are indicative of a periodic distortion across the image—effectively identifying energy peaks at the harmonic frequencies. This relies on the artefact(s) having a periodic structure, which is not the case in commonly encountered interpolation errors.
US 2011/0205369 provides a method and system for detecting image impairments or interpolation errors caused by interpolation in an image interpolated from two or more adjacent images thereto. The method involves deriving a measure of image detail for each of the interpolated or output image and the adjacent or input images, for example by summing the magnitudes of pixel-value differences between horizontally adjacent pixels in the respective images. The image detail measure of the interpolated image is then compared to an interpolation of the image detail measures of the adjacent images which, for example, may be a weighted average of the adjacent image detail measures determined by the temporal phase of the interpolated or output image with respect to the adjacent or input images. They conclude that if an excess in image detail in the interpolated or output image is detected, in comparison to the interpolation of the image detail measures in the adjacent or input images, then this may indicate the presence of interpolation artefacts.
By using a block-based sum of absolute pixel-value differences as the image detail measure, as done in the example in US 2011/0205369, the sensitivity to motion is reduced. That is, the effects of motion within each image are integrated out. However, persons skilled in the art will appreciate that differences due to moving image content entering or leaving the block or blocks, interpolation errors spanning block edges, and occluded or revealed areas due to motion, can each have a detrimental effect upon the reliability of the measurement for detecting changes in detail attributable to interpolation error alone. Because of the influence of these effects, the prior art takes the additional step of interpolating the detail measures to co-time input and output detail measure and lessen the influence of motion-related differences in said detail measures corresponding to adjacent frames, prior to comparison. A further step is also included whereby the variation in detail measure comparisons is evaluated by temporally filtering to isolate specific temporal frequencies that are more likely to be indicative of interpolation errors than motion-related differences. The skilled person will appreciate that the difficulty in distinguishing between motion-related detail differences and image interpolation error-related differences in the method of US 2011/0205369 is borne out by these additional steps described in the preferred embodiment of that patent application.
The applicant has appreciated that it would be desirable to provide a more accurate method for identifying image impairments caused by interpolation errors.