Face upsampling (super-resolution) is the task of generating a high-resolution face image from a low-resolution input image which has widespread application in surveillance, authentication and photography. Face upsampling is particularly challenging when the input face resolution is very low (e.g., 12×12 pixels), the magnification rate is high (e.g. 8×), and/or the face image is captured in an uncontrolled setting with pose and illumination variations.
Earlier super-resolution methods used image interpolation to obtain high-resolution images. These methods include nearest neighbor interpolation, bilinear interpolation and bicubic interpolation. Interpolation based image super-resolution produces smoothed images where details are of the image are lost or have inadequate quality. To obtain sharp high-resolution images, some methods used image sharpening filters such as bilateral filtering after the interpolation.
More recent methods used machine learning techniques to learn the parameters of the image super-resolution methods. The image super-resolution (SR) methods are developed for generic images, but can be used for face upsampling. In these methods local constraints are enforced as priors based on image statistics and exemplar patches. Global constraints are typically not available for the generic SR problem, which limits the plausible upsampling factor.
There are several super-resolution methods specific to face images. For example, one method uses a two-step approach for hallucinating faces. First a global face reconstruction is acquired using an eigenface model, which is a linear projection operation. In the second step details of the reconstructed global face is enhanced by non-parametric patch transfer from a training set where consistency across neighboring patches are enforced through a Markov random field. This method produces high-quality face hallucination results when the face images are near frontal, well aligned, and lighting conditions are controlled. However, when these assumptions are violated, the simple linear eigenface model fails to produce satisfactory global face reconstruction. In addition, the patch transfer does not scale well with large training datasets due to the nearest-neighbor (NN) patch search.
Another method uses a bi-channel convolutional neural network (BCCNN) for face upsampling. The method uses a standard convolutional neural network architecture that includes a convolution followed by fully connected layers, whose output is averaged with the bicubic upsampled image. The last layer of this network is fully connected where high-resolution basis images are averaged. Due to the averaging, person specific face details can be lost.