Super-resolution (SR) processing is known as an improvement of the resolution of regularly sampled multi-dimensional signals. Of special interest is the case where only a low-resolution signal is available. It is a significant difference whether a single low-resolution signal is available or a plurality of similar low-resolution signals, since in the latter case it is possible to exploit richer data values by combining the contributions of the several available signals. In the image processing literature, these methods are generically referred to as example-based super-resolution or, more precisely, single-image super-resolution. Although the following remarks are general and can be applied to signals of different dimensionality, the focus will be on the case of 2D image super-resolution.
Image super-resolution techniques have been well known for many years, starting with “Super Resolution from Image Sequences” by M. Irani and S. Peleg. Most commonly, these techniques relate to the estimation of a high-resolution image, given a set of noisy, blurred, low-resolution observations, such as consecutive images in a video sequence, using a reconstruction process that reverses the image formation model. Thus, sub-pixel motion between images, camera and post-processing blur and sub-sampling are reversed in order to fuse the available data and obtain a super-resolved image. Several globally-optimal iterative techniques are available, which basically differ in the assumed image prior model. This provides unique solutions to the otherwise ill-posed problem.
In general, the limiting factors of these techniques are in the estimation of the Point Spread Function (PSF) for image deblurring (often assumed to be Gaussian) and the registration (determination of the sub-pixel motion between images). Generally, SR techniques in the literature, e.g. Lucas-Kanade or Horn-Schunck, refer to classical Optical Flow (OF) estimation techniques for obtaining the registration. These work well in quasi-synthetic examples, but in practice the known solutions in OF estimation are unable to robustly register consecutive frames in video-sequences with sufficient accuracy when more general motion appears.
In “Fundamental Limits of Reconstruction-Based Superresolution Algorithms under Local Translation”, Z. Lin and H.-Y. Shum show that, under a wide range of natural conditions, this type of reconstruction-based SR algorithms have a fundamental limit in the maximal increase in resolution of around 1.6×. However, the article proves that, in synthetic scenarios, which is the commonly explored in most of the available publications, a much looser limit exists, which allows for resolution increases of up to 5.7×. This is due to the favorable conditions in terms of registration, when sub-pixel shifts are generally exact fractions of the pixel size.
An alternative type of SR algorithms attempts to increase the resolution of images by adequately enriching the input visual data (low-resolution images) with a-priori known examples of higher-resolution. These techniques are commonly referred to as example-based super-resolution (EBSR). In “Example-based super-resolution”, W. T. Freeman, T. R. Jones and E. G. Pasztor obtain suitable high-resolution examples from a sufficiently generic image-patch data-base, the high-frequency contents of which are averaged and conveniently fused with the low-frequency contents of the input image. However, the performance of the algorithm worsens as the target scene deviates from the cases included in the example data-base (when none of the known patches actually resembles that of the input image). In practice, enlarging the size of the data-base would incur an excessive computational cost in the search for the best matching training patches. So, this technique is not generically usable, but is focused on super-resolving images of a certain class.
In order to cope with this problem and behave adaptively to the contents to be magnified, other EBSR algorithms extract high-resolution examples from within the single input image, for which a pyramid representation of the image at different resolutions can be obtained at small downscaling factors. Then, for every patch (e.g. 5×5 pixels) in the input image, matching patches are searched across all or part of the image at different resolutions (levels in the pyramid) in order to perform per-patch data fusion similarly to reconstruction-based super-resolution. This technique is best represented by “Super-Resolution from a Single Image” by D. Glasner, S. Bagon and M. Irani, and “Space-Time Super-Resolution from a Single Video” by O. Shahar, A. Faktor and M. Irani, which is a follow-up for video super-resolution. The authors obtain a simultaneous increase in image resolution and frame rate, including removal of temporal aliasing, at the cost of an increase of the computational complexity due to 3D spatio-temporal search across video frames at several spatial and temporal scales. This renders the approach unusable for real-time operation with current computing capabilities. This approach is also used in WO2010/122502 A1.
Other known approaches suffer also from being costly and in general not indicated for real-time approaches, or tending to produce some unrealistic-looking edges by imposing excessive contrast, or tending to generate over-smoothing in textured areas, which in a general case produces unnaturally looking images.
In “Image and Video Upscaling from Local Self-Examples” by G. Freedman and R. Fattal, the proposed strategy is to exploit self-similarity in a local neighborhood of each image patch. This is shown to provide results close to the full-image searches used in “Super-Resolution from a Single Image”, with the benefit of a reduced computation time. A drawback of this approach is that the highly sophisticated design of the space-variant filters used for separating high-frequency from low-frequency in the images is not done on the fly, which results in a limited set of selectable up-scaling factors.