Example- or learning-based super-resolution (SR) methods predict high-frequency (HF) image details of a low-resolution (LR) image from a training data set. A relation between patches of an input image and the training data set of the dictionary is determined, and corresponding high-resolution (HR) patches from the dictionary are used to synthesize the HR patches for the input image.
Neighbor embedding (NE)-based algorithms estimate HR patches by linearly combining the HR counterparts of neighbors found in the LR domain. Such NE-based algorithms are known to be more generally usable, so that even with relatively small training data sets good results can be achieved. Sparse super-resolution uses a one-time trained dictionary, which is generic for several or all images and that is trained with natural images. “Sparse” means inter alia that only few, not all, of the dictionary entries are actually used for HR patch construction, and these can be combined for obtaining a precise reconstruction.
J. Yang et al. in “Image Super-resolution via Sparse Representation” [18] show the effectiveness of sparsity as a prior for regularizing the SR-problem, which is otherwise ill-posed. In J. Yang et al. “Image Super-Resolution as Sparse Representation of Raw Image Patches” [19], a similar approach is used with uncompressed, and therefore larger, dictionaries.
X. Gao et al. in “Image Super-Resolution With Sparse Neighbor Embedding” [21] discloses image super-resolution with sparse neighbor embedding (SpNE), where first a large number of neighbors is predetermined as potential candidates. Simultaneously, weights for the linear combination are estimated. Histograms of oriented gradients (HoG) are used as a descriptor, which work well for local geometry representation. The whole training data set is partitioned into a set of subsets by clustering the histograms of oriented gradients (HoG).
The fast development and popularization of low-cost capturing devices contrast with the proliferation of high definition displays. The applications delivering low resolution images are diverse (e.g. surveillance, satellite) and there is also abundant multimedia content whose resolution is nowadays not up-to-date with current display's capabilities. To fill in this gap, the so called Super-resolution (SR) techniques are used. SR deals with an inverse ill-posed problem which aims to obtain a high-resolution (HR) image from its low-resolution (LR) version by restoring the potential available hidden information. SR approaches can be broadly divided into three different categories: reconstruction based, interpolation based and learning-based methods. The first proposed techniques were the classical multi-image reconstruction-based SR [5,8]. This reconstruction techniques need as a input several images of the same scene with sub-pixel displacements which are used to build a set of linear constraints for the new high resolution pixels intensities. If enough images are provided, the set of equations is determined and can be solved to obtain the high resolution image. This approach, however, depends on the accuracy of the required registration process and is limited to small magnification factors [1,11]. First efforts in interpolation methods used well-known interpolation kernels such as Bilinear or Bicubic [9] which are fast and non-complex but tend to produce overly smooth edges. Further research in interpolation methods exploit natural image priors [4,16] yielding improved results but still having a limited performance when dealing with complex textures. Some of the already mentioned limitations were broken with machine learning SR methods [3,6,7] which aim to learn the relation from LR to HR from a training dataset, usually at a patch level. In [6] the prediction from LR to HR patches is learned through a Markov Random Field and solved by belief propagation. This idea was extended by [15] using primal sketch priors (e.g., edges, ridges and corners). However, this approaches suffer from the need of having large training datasets, in the order of millions of patch pairs, thus being computationally heavy. [3] proposed a manifold assumption where LR and HR manifolds have similar local geometry in two distinct feature spaces. Following this assumption, locally linear embedding (LLE) is used to estimate HR patches by combining the respective HR counterparts of the LR patches found in the training dataset. Recent SR research explored the so called sparsity prior, where LR patches are coded with respect to an over-complete dictionary in order to linearly combine their HR counterparts with the same sparse code [19]. [18] introduced the use of a learnt dictionary pair with a more compact representation of the patch pairs, reducing its size (and as a results also its computational load) compared to previous approaches where dictionaries were a bigger collection of raw patches. The performance of machine learning SR methods depend strongly on the content of the training dataset. In [19], the dictionary is built through sampling raw patches randomly out of a bigger set of images without any concern about the patches being useful to the image to be recovered, and in [18] these raw patches are compressed in a smaller number of patches through sparse coding techniques.