Generating a high resolution (HR) image from one or more low resolution (LR) images is called image super resolution (SR). HR images have a higher pixel density than their LR counterparts, and provide details which are obscured in LR versions. Image SR is very useful in a number of applications including satellite imagery, pattern recognition, and medical diagnosis, to name a few.
Single-image super-resolution (SISR) is the problem of generating a HR image from just one LR image. There are two basic types of image SISR techniques. The first involves interpolation, which cannot recover missing high-frequency information. The second type utilizes learning based methods (training). While the prior art has applied variants of learning based methods to scene, animal, and human images, few have successfully applied these methods to text images. Additionally, electronic conversion of scanned text images into machine-encoded text (e.g. ASCII) is known as optical character recognition (OCR). OCR is essential for automatic processing of text documents by facilitating electronic searching and efficient storage. OCR is used in machine translation, text-to-speech applications, text data mining, and is widely used for data entry from printed records. When the text resolution is poor, OCR engines frequently have unacceptably high character error rates (CER). Such catastrophic performance often occurs when documents are scanned at low dpi (pixels per inch) to conserve memory and even when the LR documents are readable by humans. Furthermore, the interpolation methods result in very high character error rates when OCR is applied to their HR text image estimates. Additionally, learning based methods exhibit computational speeds which are too slow in practice for OCR enhancement, unless their training models are quite small. This limitation on model size means that prior art models must be specifically trained on a particular low resolution font style and size, and cannot be applied to a different font style and size
The present invention addresses the SISR problem for various types of grayscale images (i.e. scenes, faces, text) and for text documents. The present invention also discloses a method that improves OCR performance by lowering OCR character error rate (CER), which we refer to as OCR enhancement. The present invention offers a solution that significantly accelerates computational speed compared to prior art learning based SISR so that a single model suffices for most Latin font styles and sizes, and also significantly lowers CER when HR image estimates obtained with the invention are input to OCR engines.
The prior art includes several published articles including “Super Resolution through Neighbor Embedding” by H. Chang, D. Yeung, and Y. Xiong, CVPR, volume 1, pp. 275-282, 2004. Chang et al discloses a method for creating high resolution images. However Chang et al's algorithm is inadequate for OCR enhancement because it utilizes an exact nearest neighbor search which is not fast enough for OCR. Specifically, exact nearest neighbor searches require exhaustively searching thousands of training vectors at multiple locations in the low resolution input image, which results in an extended processing time. Since this exhaustive search is extremely slow, Chang et al's method is limited to using an individual training model for each font style and size. Consequently, Chang et al's method also requires knowledge of the font style and size of the LR input image in order to apply the matching training model. However, with OCR enhancement, the font style and size of the input image are generally unknown. Therefore, Chang's method is inadequate for OCR enhancement. Additionally, Chang's method uses a feature vector consisting of first and second order partial derivatives of the LR input image at each pixel value in a corresponding K×K image patch, resulting in a feature vector which is four times longer than the feature vector of the present invention. Consequently Chang's et al's method uses unnecessarily large training models which require significant amounts of computer memory. This large memory footprint limits the amount of training information that can be represented by Chang et al's models, which in turn restricts the accuracy of Chang's HR image estimates.
The present invention uses a feature vector that is only 25% as long as Chang's and results in a model that requires only a quarter of the memory required for the Chang models. As such, the present invention results in four times as much training information that can be represented in the same computer memory as Chang's method, which in turn results in more accurate HR image estimates.
Other notable articles include “Nonlinear Dimensionality Reduction by Locally Linear Embedding” by S. T. Roweis and L. K. Saul (Science, vol. 290, pp. 2323-2326, 2000), “Resolution Enhancement based on Learning the Sparse Association of Patches” by J. Wang, S. Zhu, Y. Gong (Pattern Recognition Letters, vol. 31, pp. 1-10, 2010), “Locality Preserving Constraints for Super-resolution with Neighbor Embedding” by B. Li, H. Chang, S. Shan, and X. Chen (IEEE ICIP 2009, pp. 1189-1192). While the aforementioned prior art considers the noted SISR problem, none offer a solution that is as computationally efficient and accurate as the present invention and results in a lower CER than the prior art given the same memory resources. Not only does the prior art require significantly more memory than the present invention but it also takes more than 1000 times longer to construct a HR image estimate. The above mentioned articles are hereby incorporated by reference into the specification of the present invention.
The prior art also includes the following patents. U.S. Pat. No. 7,379,611 entitled “Generic Image Hallucination” discloses a learning based method SISR method. Here the prior art differs from the present invention because it uses a training set comprised of LR/HR pairs of “primitives”, i.e., patches which focuses on edges, ridges or corners. U.S. Pat. No. 7,379,611 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 8,233,734 entitled “Image Upsampling with Training Images” discloses a method that creates a high resolution version of a low resolution input image creating a “course” HR image from a LR input image using patch matching where each patch contains a primitive (i.e., an edge, corner, or ridge) and use probabilistic models of primitives and contour smoothness constraints. The present invention does not use primitives or probabilistic models of same. U.S. Pat. No. 8,233,734 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 7,218,796 entitled, “Patch-based Video Super-resolution”, discloses a method for video super-resolution that uses a training set containing HR/LR patch pairs and employs patch matching. Here, the prior art represents low resolution patches, primarily by using only the patch values, or by dividing the patch values by the energy in the patch. The present invention does not employ this type of feature vector. U.S. Pat. No. 7,218,796 is hereby incorporated by reference into the specification of the present invention.
U.S. Patent Appl. #20080267525, entitled “Soft Edge Smoothness Prior and Application on Channel Super Resolution”, discloses SISR that is based on edge segment extraction from the input LR image and performing a super-resolution on the extracted edges. The present invention does not require the use of edge information. U.S. Patent Appl. #20080267525 is hereby incorporated by reference into the specification of the present invention.
U.S. Patent Appl. #20110305404, entitled “Method and System for Example-based Face Hallucination discloses a learning based method which uses a training set of LR/HR face images, and in addition uses a dimensionality reduction technique to project the LR training images (each considered as a vector of pixel values) into a low dimensional space. Then an input LR image is projected in the same manner, and the projected LR input image is matched to the nearest LR projected training images. Thus, the method does not use local patches as in neighbor embedding, but uses global image projections in matching. U.S. Patent Appl. #20110305404 is hereby incorporated by reference into the specification of the present invention.