The present invention relates generally to processing images, and more particularly to generating super-resolution images.
It is desired to enlarge or xe2x80x9czoomxe2x80x9d images beyond the resolution at which they were sampled. Such images are said to have xe2x80x9csuper-resolution.xe2x80x9d
Polygon images derived from data structures can offer resolution independence over a wide range of scales. Edges remain sharp until one zooms in very close. However, at close ranges, undesired artifacts will appear depending on the size of the polygons. In addition, it is difficult and time consuming to construct and render resolution-independent polygons for complex, real-world objects.
In contrast, pixel images easy to acquired directly from cameras, and rendering pixel images is trivial. In addition, pixel images are rich in detail. Unfortunately, pixel images do not have the same resolution independence as polygon images. When a super-resolution pixel image is generated, blurring and loss of detail is problematic.
Therefore, there is a need for a method that achieves resolution independence when enlarging or zooming pixel-based images. In addition, many other applications in graphics or image processing can benefit from resolution independent image processing, such as texture mapping, consumer photographs, target identification, and converting small screen, analog video to large screen, HDTV data.
Super-resolution can be characterized as an image interpolation problem. The interpolation generates new pixels from existing data. A number of techniques are known for generating super-resolution pixel images.
Cubic spline interpolation is a common image interpolation method, see R. Keys xe2x80x9cBicubic interpolation,xe2x80x9d IEEE Trans. Acoust. Speech, Signal Processing, 29:1153-1160, 1981. However, super-resolution images generated by that method can still have blurred edges and loss of image details. Recent attempts to improve on cubic spline interpolation have met with limited success, see F. Fekri, R. M. Mersereau, and R. W. Schafer, xe2x80x9cA generalized interpolative vq method for jointly optimal quantization and interpolation of images, Proc. ICASSP, Vol. 5, pages 2657-2660, 1998, and S. Thurnhofer and S. Mitra, xe2x80x9cEdge-enhanced image zooming, xe2x80x9cOptical Engineering,xe2x80x9d 35(7):1862-1870, July 1996.
A proprietary method performs well but highly textured regions and fine lines still suffer from blurring, see Altamira Genuine Fractals 2.0, 2000 for Adobe Photoshop. R. R. Schultz and R. L. Stevenson, xe2x80x9cA Bayesian approach to image expansion for improved definition,xe2x80x9d IEEE Trans. Image Processing, 3(3):233-242, 1994, used a Bayesian method for super-resolution. However, they hypothesized the prior probability so the resulting images are blurred. A training-based approach was described by A. Pentland and B. Horowitz, xe2x80x9cA practical approach to fractal-based image compression, Digital images and human vision, A. B. Watson, editor, MIT Press, 1993. However, they made no attempt to enforce the spatial consistency constraints necessary for good image quality.
Training-Based Super-Resolution
Recently, an iterative training method was described by W. T. Freeman, E. C. Pasztor, and O. T. Carmichael, xe2x80x9cLearning low-level vision, Intl. J. Computer Vision, 40(1):25-47, 2000, also see U.S. patent application Ser. No. 09/236,839 xe2x80x9cEstimating Targets using Statistical Properties of Observations of Known Targets,xe2x80x9d filed by Freeman et al, on Jan. 15, 1999. They xe2x80x9ctrainxe2x80x9d a Markov network. Their method synthesizes realistic looking textures and edges by using a large library of local image data. The local image data are learned from training images. High resolution training images are blurred and down-sampled by a factor of two in each dimension to yield corresponding low resolution images. Patches derived from these pairs of high and low resolution images are sampled to form a library of thousands of image patch pairs.
In order to reduce the amount of training data needed, the patches are pre-processed. The low resolution images are first scaled up by a factor of two in each dimension by some conventional interpolation means, such as bilinear interpolation or bicubic spline interpolation, to form the interpolated low resolution image.
The interpolated low resolution images are high-pass filtered, removing the lowest spatial frequency components, to obtain the mid band images. The interpolated low resolution images are also subtracted from the corresponding high resolution images, to obtain the high band images. The patches of the mid band images are contrast normalized, and the corresponding patches of the high band images are contrast normalized by that same amount. This avoids re-learning the same low to high resolution mapping for all different values of the lowest spatial frequency image components, and for all possible local image contrasts. The training data carry assumptions about the structure of the visual world, and about image degradation when blurring and down-sampling. This information can then be used when estimating the high band image. The high band image is then added to the interpolated low resolution image to form the high resolution estimated image, which is the output of the super-resolution algorithm.
In a zooming phase, an input low resolution image is preprocessed the same way as the training patches. However, the local image information is not sufficient to predict the missing higher frequencies for each input patch. Therefore, spatial consistency between high band patch selections at adjacent patch locations is taken into account by a Markov network.
From the training data, their method finds a set of d candidate high band patches for each input mid band image patch. A dxc3x97d compatibility matrix is computed for each pair of adjacent high band patches. The compatibility matrix indicates consistency between the adjacent high band patches in a region of overlap. The computational cost of this operation, in terms of multiplies, is d22NK, where N is the number of pixels in a high band patch, assuming a two-pixel overlap between high band patches, and K is the number of patches in the image. The optimal high band patch at each location is determined by applying multiple iterations of Bayesian belief propagation. Here, the computational cost per iteration is O(d2K). Needless to say, processing images by this method is very time consuming.
Therefore, there is a need for a method that can provide quality super-resolution images in a reasonable amount of time, for example, during a single processing pass without any iteration.
The invention provides a super-resolution image that is generated from a pixel image. The images are first partitioned into respective overlapping interpolated low resolution patches and corresponding high resolution patches. The interpolated low resolution patches are then processed in a raster scan order.
For each interpolated low-resolution patch, a mid band input patch is generated. A search vector is constructed from pixels in the mid band input patch, and pixels in an overlap region of adjacent previously predicted high band patches.
A nearest index vector to the search vector is located in a training database, and the nearest index vector has an associated high band output patch. The high band output patch is then combined with the interpolated low frequency patch to predict pixel values for the corresponding high resolution patch of the super-resolution image.