Increasing image resolution, or image upscaling, is a challenging and fundamental image-editing operation of high practical and theoretical importance. While nowadays digital cameras produce high-resolution images, there are many existing low resolution images as well as low-grade sensors, found in mobile devices and surveillance systems, which will benefit from resolution enhancement. At its essence, image upscaling requires the prediction of millions of unknown pixel values based on the input pixels, which constitute a small fraction of that number. Upscaling is also intimately related to a variety of other problems such as image inpainting, deblurring, denoising, and compression.
Perhaps the simplest form of single-image upscaling predicts the new pixels using analytical interpolation formulae, e.g., the bilinear and bicubic schemes. However, natural images contain strong discontinuities, such as object edges, and therefore do not obey the analytical smoothness these methods assume. This results in several noticeable artifacts along the edges, such as ringing, staircasing (also known as ‘jaggies’), and blurring effects. Image upscaling has been studied extensively by the computer graphics, machine vision, and image processing communities. The methods developed over the years differ in their formulation and underline prior image model and the input data they use. Here we briefly describe the main approaches to the problem and the principles behind them. We focus on single-image upscaling methods which is the assumed settings of our new method.
The classic and simplest approach uses linear interpolation in order to predict intermediate pixels values. This method is usually implemented using linear filtering, such as the bilinear and bicubic filters, and it is commonly found in commercial software. These interpolation kernels are designed for spatially smooth or band-limited signals which is often not the case in natural image. Real-world images often contain singularities such as edges and high-frequency textured regions. As a result, these methods suffer from various edge-related visual artifacts such as ringing, aliasing, jaggies, and blurring. Thevenaz et al. [25] provide a more elaborate survey of these methods and their evaluation.
More sophisticated methods adapt the interpolation weights based on the image content. For example, Li et al. [10] adapt the interpolation weights according to the local edge orientations and Su et al. [20] choose three out of the four nearest pixels for linear interpolation. This allows the ringing effects to be reduced and obtains somewhat sharper edges. Non-quadratic smoothness functionals yield a different type of non-linear image regularization which can be used for upscaling. For example, Aly and Dubois [1] enlarge images by minimizing the total variation functional. Shan et al. [19] minimize a similar metric using a sophisticated feedback-control framework that keeps the output image consistent with the input image when downscaling it to the input resolution.
Inspired by recent studies of natural image statistics, several methods use random Markov field models to define a probability density over the space of upscaled images. The output image, in many cases, is computed by maximizing these models. These approaches can be divided to two main classes: ones that define non-parametric example-based models and ones that are based on analytical image modeling.
Example-based image enlargement is explored by Freeman et al. [8] and further developed in [Freeman et al. 7]. This image prediction model relies on a database of example patches that are decomposed into a low-frequency band, i.e., a smoothed version, and the residual higher frequency band. The input image is interpolated to a higher resolution using analytic interpolation and the missing high-frequency band is then predicted from the example patches. The matching is performed according to the low-frequency component of the example patches. This approach is capable of producing plausible fine details across the image, both at object edge and in fine-textured regions. However, lack of relevant examples in the database results in fairly noisy images that show irregularities along curved edges. The use of larger databases is more time consuming due to the added comparisons in the nearest-neighbor searches. The use of approximate nearest neighbor searches offers a limited solution, as it introduces its own errors. Tappen et al. [24] also use a patch-based model and require the output to be consistent with the input.
Motivated by earlier works by Barnsley [2] that study the fractal nature of images and its application to image compression, Robert et al. [18] and Vrscay et al. [26] interpolate images using fractal compression schemes which contain extra decoding steps. This approach suffers from strong block artifacts which can be reduced using overlapping range blocks as disclosed by Reusens [16] and by Polidori et al. [13]. Based on these works, Ebrahimi and Vrscay [4] use the input image at multiple smaller scales as the source for example patches, relying on self-similarity in small patches. While this offers an example database of a limited size compared to universal databases, as we show later, this example data is considerably more relevant to the input image being enlarged. Suetake et al. [21] also use the input image to compute an example codebook which is later used to estimate the missing high-frequency band, in a framework similar to Freeman et al. [7].
Recently, several parametric image models have been proposed for upscaling. These methods fit analytical models to describe various image features that show statistical dependency at different scales. Fattal [6] models the relation between edge descriptors, extracted from the input, and gradients at a higher resolution. A fully analytical prior for the reconstructed edge profile is used by Sun et al. [23]. These approaches are considerably faster than their example-based counterparts and are capable of reproducing sharp edges with no apparent noise. Nevertheless, the resulting images tend to appear somewhat unrealistic as they are made of generic edges that often separate color plateaus. Sun et al. [22] describe a Markov random field that combines example-based and parametric modeling together.
Besides single image upscaling, many works deal with multi-frame super-resolution where multiple shots of the same scene, taken at translational offsets, are used to generate a single high-resolution image of the scene. It has also been proposed to use robust regularization to deal with the noise that limits this operation [5, 11]. Given high-resolution photographs of a static scene Bhat et al. [3] enhance videos of that scene by rendering pixels from the photographs. Recently, Glasner et al. [9] unify the multi-frame and example-based super-resolution techniques and derive a single-image method. This method uses the formalism of multi-frame super-resolution yet relies on self-similarities in the image to obtain samples differing by sub-pixel offsets.