Various image scaling techniques are known in the prior art. Many of these are of the so-called B-spline interpolator type. The simplest of the B-spline interpolators are of the zeroth and first orders. These are known respectively as pixel replication and bilinear interpolation. In pixel replication, each pixel in the high resolution output image is obtained by taking the value of the closest pixel in the low resolution input image. In bilinear interpolation, each pixel in the high resolution output image is obtained by computing a linear combination of up to four pixels in the low resolution input image. Higher order interpolations use more sophisticated techniques, but are computationally intensive.
A different, classification-based approach is disclosed in “Optimal Image Scaling Using Pixel Classification” by C. B. Atkins, C. A Bouman and J. P. Allebach, in Proceedings of the 2001 International Conference on Image Processing, 7-10 October 2001, volume 3, page(s): 864-867 and in more detail in the PhD thesis of C. B. Atkins entitled “Classification-based Methods in Optimal Image Interpolation”. FIGS. 1 and 2 show schematically the technique disclosed by Atkins et al.
Referring first to FIG. 1, for every input pixel 1 in the low resolution input image 2, in this example a 5×5 window 3 of pixels in the low resolution input image 2 is centred on the input pixel 1. This 5×5 window is column-wise vectorised into a 25×1 column vector Z (referred to as an observation vector Z in the Atkins documents). Next, in a feature extractor block 11, a non-linear transformation projection operator f is applied to the observation vector Z to obtain a cluster vector or feature vector Y, i.e. Y=f(Z) for each input pixel 1. Generally speaking, Y is of lower dimension than Z. This mapping from the observation vector Z to the feature vector Y affects the quality of the final, interpolated high resolution image. A number of different projection operators f are possible.
The feature vector Y for each input pixel 1 is then passed to a classifier block 12 in which the feature vector Y for every input pixel 1 in the low resolution input image 2 is associated with (or “classified” or “clustered” to) a limited number of context classes. These context classes model different types of image regions in the input image 2, such as edges, smooth regions, textured regions, etc. For this purpose, distribution parameters θ are supplied to the classifier block 12, and the degree of association of the feature vector Y with the individual classes is determined inter alia in accordance with the distribution parameters θ. Atkins et al propose a statistical framework for attributing the input pixel 1 to the context classes. Thus, for every input pixel 1, the classifier block 12 computes and outputs a number between 0 and 1 for each of the context classes, these numbers being weight coefficients and summing to 1. The weight coefficients given to each class by the classifier block 12 are the likelihoods that the input pixel 1 belongs to the respective classes, i.e. wi is the probability that the input pixel 1 belongs to class i. The distribution parameters θ are obtained during a training process, which will be discussed further below.
Once the feature vector Y for each input pixel 1 has been associated with the context classes, a linear filter block 13 is used to obtain the output pixels 5 of the high resolution output image 6. In particular, for every input pixel 1 of the low resolution input image 2, a 5×5 window 3 of pixels centred on the input pixel 1 is passed to the linear filter block 13. The linear filter block 13 takes as an input the weight coefficients of the context classes calculated by the classifier block 12 for the particular input pixel 1 and takes as another input interpolation filter coefficients ψ which correspond to the respective context classes, and calculates an L×L block of output pixels 5 for the high resolution output image 6 by a linear combination of the outputs of the interpolation filters in proportions according to the weight coefficients calculated by the classifier block 12 for the particular input pixel 1. (L is the scaling factor. In the example shown in FIG. 1, L=2.) The interpolation process carried out by the linear filter block 13 can be summarised as:
  x  =            ∑              i        =        1            M        ⁢                  w        i            ⁡              (                  AZ          +          β                )            where X is the L×L block of output pixels 5 in the interpolated high resolution output image 6, A is an interpolation matrix, β is a bias vector, wi are the weight coefficients for the respective context classes/filters, and M is the number of context classes/filters.
In short, in this prior art technique, the input pixels 1 are “classified” or associated with each of the context classes of the image to different degrees or “weights”. The window X of interpolated output pixels 5 is obtained by first filtering the observation vector Z with the optimal interpolation filter coefficients ψ for the individual classes and then combining the results in a mixture of proportions as determined in the classification phase. The interpolation filter coefficients ψ input to the linear filter block 13 correspond to the interpolation matrix A and the bias vector β and are again obtained during a training process, which will be discussed further below.
Clearly, of key importance in this technique are the distribution parameters θ, which relate to the distribution of pixels amongst the context classes, and the interpolation filter coefficients ψ, which in essence determine the mixtures of the interpolating filters that are added together to obtain the output pixels 5 of the high resolution output image 6. In the Atkins documents, the distribution parameters θ and the interpolation filter coefficients ψ are obtained in an offline training method, which is carried out typically on a once-only basis. In particular, high resolution images are procured and then low resolution images are obtained from the high resolution images (by one or several different techniques). Referring to FIG. 2, a respective feature vector Y is obtained for each input pixel in the low resolution images. These are passed to a distribution parameter estimation block 14 in which an initial estimate of the distribution parameters θ is obtained. By making various assumptions, the values of the distribution parameters θ are updated to find a final, optimal value. Next, the interpolation filter coefficients ψ are obtained in an interpolation filter design block 15.
In this process, as noted in the Atkins documents, one of the principal assumptions that is made is that the feature vector Y provides all of the information about an input pixel in the low resolution (training) image and how it relates to the corresponding block of pixels in the high resolution source image. Owing to this assumption, the estimation of the interpolation filter coefficients ψ is effectively decoupled from the estimation of the distribution parameters θ. This actually causes the computed interpolation filter to be sub-optimal, which in turn implies a loss of possible quality of the interpolated high resolution image.