Super-resolution is a task aiming to produce a high-resolution image from a low-resolution image. For example, face upsampling or a face super-resolution is the task of generating a high-resolution face image from a low-resolution input image of the face. The face upsampling has widespread application in surveillance, authentication and photography. Face upsampling is particularly challenging when the input face resolution is very low (e.g., 12×12 pixels), the magnification rate is high (e.g. 8×), and/or the face image is captured in an uncontrolled setting with pose and illumination variations.
There are mainly three categories for approaching the super-resolution, namely interpolation-based methods, reconstruction-based methods, and learning-based methods. Interpolation-based methods are simple but tend to blur the high frequency details. For example, the interpolation-based methods include nearest neighbor interpolation, bilinear interpolation and bicubic interpolation. However, the interpolation based image super-resolution produces smoothed images where details of the image are lost or have inadequate quality. To obtain sharp high-resolution images, some methods used image sharpening filters such as bilateral filtering after the interpolation.
Reconstruction-based methods enforce a reconstruction constraint, which requires that the smoothed and down-sampled version of the high-resolution image need to be close to the low-resolution image. For example, one method uses a two-step approach for hallucinating faces. First, a global face reconstruction is acquired using an eigenface model, which is a linear projection operation. In the second step details of the reconstructed global face is enhanced by non-parametric patch transfer from a training set where consistency across neighboring patches are enforced through a Markov random field. This method produces high-quality face hallucination results when the face images are near frontal, well aligned, and lighting conditions are controlled. However, when these assumptions are violated, the simple linear eigenface model fails to produce satisfactory global face reconstruction.
Learning-based methods “hallucinate” high frequency details from a training set of high-resolution/low-resolution image pairs. The learning-based approach relies to a significant extent on the similarity between the training set and the test set. However, it is challenging for the learning-based methods to reconstruct high-frequency details of the super-resolved image, see, e.g., U.S. Pat. No. 9,836,820. For example, one method uses a bi-channel convolutional neural network (BCCNN) for face upsampling. The method uses a convolutional neural network architecture that includes a convolution followed by fully connected layers, whose output is averaged with the bicubic upsampled image. The last layer of this network is fully connected where high-resolution basis images are averaged. Due to the averaging, person specific face details can be lost.
Accordingly, there is a need for a learning-based super-resolution method suitable for upsampling high-frequency details of an image.