Super-resolution (SR) methods aim to generate new high-resolution (HR) information beyond the Nyquist frequency of an existing low-resolution (LR) image. SR methods are attracting great practical interest, especially for HDTV, UHDTV (4KTV and 8KTV), video communication, video surveillance, medical imaging, etc. For example, a HDTV image of 1080 lines and 1920 pixels per line may be converted to a UHDTV image of 2160 lines and 3840 pixels per line by expanding each HDTV pixel to four UHDTV pixels.
Super-resolution technologies can be classified into the classical multi-frame SR and single-frame SR. The multi-frame SR method recovers high frequency information from multiple frames of a video or a set of images with sub-pixel misalignment. Most of the approaches involve motion estimation method to recover these misalignments. Various blending and regularization methods such as IBP (iterative back-projection) and MAP (maximum a-posterior) have been used to make the reconstructed HR image consistent with the input LR image. Weights may be calculated using various cues such as a degree of similarity in patch matching, motion vector continuities, and the length of motion vectors for blending and regularization.
Since multi-frame SR methods need to capture, buffer, and manipulate multiple images or frames, the memory consumption and computational complexity are rather high. Moreover, although such SR schemes provide reasonably stable results up to a magnification factor of about 2, they are limited in the presence of noise and misregistration. These limitations and the undesirability of any resulting visual artifacts have led to the development of single-frame SR methods, which are also named example-based SR, learning-based SR, or “hallucination”.
Typical example-based SR methods recover high-resolution (HR) information using one single-input low-resolution (LR) image. Two major modules are used, HR information recovery, and restoration. In the first module, the input LR image is first divided into many small LR patches that may overlap. For each LR patch, the first module searches its corresponding high-resolution examples in a pre-trained database of any other LR images and/or downsampled/upsampled LR images. Then, the resulting HR patches are used to reconstruct an enlarged image, typically using a blending and weighting process. There is also an approach of selecting patches instead of searching to reduce the computational complexity. In the second module, post-processing such as Iterative Back-Projection (IBP) is used to keep the consistency between the reconstructed HR image and the input LR image, using some assumptions such as an image formation model. There are also some other single-frame SR approaches using other technologies, such as a FFT-based iterative deblur method.
Some approaches use techniques from both the “classic SR” (i.e. multi-frame SR) and example-based SR (i.e. single-frame SR). For example, patch examples may be searched from both a downsampled input LR image and the LR image itself. A hybrid SR approach may extending the search in the current LR image to multiple frames in a video.
Super-resolution has many possible solutions. Many of the existing SR approaches employ optimization methods such as MAP (Maximum a-Posterior), ML (Maximum Likelihood) and IBP (Iterative Back-Projection) to regularize the reconstruction image to be consistent with the input LR image while balancing the sharpness and the artifacts. These approaches are based on certain objective criteria such as Mean Square Error (MSE).
A Human Visual System (HVS) model attempts to model a human's visual preferences, which may be somewhat subjective. HVS has different preferences and sensitivity to image details and artifacts in different local regions. For example, noise and artifacts in the random texture region are less visible for HVS than those in a regular structure region. Humans may immediate notice an artifact or error which seems out of place in an otherwise regular structure, such as a checkerboard, but the same artifact in a random region may not be very noticeable. Thus the same size artifact may be quite irritating to the user when located in a regular structure, but may be invisible when in a random region of the picture.
The HVS model mimics this human preference by permitting more detail information (and a greater chance of artifacts or errors) in a random-texture region than in regular structure regions. Prior art SR methods that ignore the HVM may not create an optimal high-resolution image in terms of a viewer's visual experience.
The HVS model is used to predict the perceptual characteristics of people and has been intensively researched for decades. The HVS models such perceptions as visual attention, foveation, color perception, stereo perception and Just Noticeable Distortion (JND) which has solid support from biological and psychological experiments. Among these models, the JND model is widely used in image processing. The JND model outputs a threshold that represents the limitations of a person's HVS in perceiving small changes in an image. If the noise, artifacts or detail changes in an image are smaller than the JND threshold, they cannot be perceived by the visual system of human being. In practice, these image distortions can be ignored in image processing.
The JND model is usually formulated based on the luminance adaptation, contrast masking, and color masking characteristics of the HVS in a spatial or transformed domain. In the most recent research, the impacts of different textures and temporal variations are also considered.
Some approaches use the JND model to reduce the computational complexity or to select different processing methods used in image upscaling and SR. A JND model that considers luminance adaption and contrast masking may be used to terminate MAP iterations, so that the computation of the SR can be reduced. While useful, prior art approaches considered only a part of HVS characteristics in optimizing the SR reconstruction.
What is desired is an image converter that can generate Super-Resolution images. An image converter can upscale images to a higher resolution is desired. Super-Resolution images that better fit a human's visual experience is desirable. In particular, using both single-frame and multi-frame information is desirable. It is desired to suppress artifacts in regular structures while allowing artifacts and more detail in random structures within a picture. It is desired to identify immaculate regions that are generated to have less detail and fewer resulting artifacts, and detail-preferred regions that are allowed to have artifacts in the SR image.