1. Technical Field
A “Text Rectifier,” as described herein, processes selected regions of an image containing text or characters by treating those selected image regions as matrices of low-rank textures and using a rank minimization technique that recovers and removes image deformations while rectifying the text or characters in the selected image regions.
2. Background Art
Optical character recognition (OCR) has been one of the most successful applications of pattern recognition. The techniques and technologies for OCR have become fairly mature in the past decade or so and for many languages the text recognition accuracy rates of many commercial OCR software products are generally in the high 90th percentile. Such software products are widely available in a large variety of applications for use with various computing platforms.
One of the main limitations of current OCR technology is that its recognition performance is rather sensitive to deformation in the input characters. That is, such applications tend to perform well when the characters are presented in their standard upright position, at which most OCR engines were trained. Most widely used commercial OCR systems can tolerate only very small rotations and skews in the input characters. For example, two of the most popular OCR systems generally perform well up to about 5 degrees of rotation and up to a skew value of about 0.1. In fact, when there are merely 20 degrees of rotation or a skew value of 0.3, the recognition rates of such systems have been demonstrated to degrade rapidly from high 90th percentile to below 10 percent accuracy.
Consequently, typical OCR techniques are known to perform more poorly as the distortion of the text increases. Unfortunately, this is a common problem even for the conventional use of OCR in digitizing books or documents, where the scanned texts can be significantly warped if the page is not purely flat or upright. In the computer vision and pattern recognition literature, there have been many techniques developed in the past to preprocess and rectify such distorted text documents. However, most of these techniques rely on a global regular layout of the texts to rectify the distortion. That is, the rectified texts are expected to lie on a set of several (or many) horizontal, parallel lines, often in a rectangular region. Hence, many different methods have been developed to estimate the rotation or skew angle based on statistics of the distorted text compared to the standard layout, including methods based on projection profiles, Hough transform for gradient/edge directions, morphology of the text region, cross-correlation of image blocks, etc. Unfortunately, real-world images of text are not always provided in such neat rectangular regions.
For example, as smart mobile phones, media players, handheld computing devices, etc., have become increasingly popular, embedded digital cameras in such devices are increasingly used to capture images containing text. Such images are generally captured from a widely varying viewpoints and angles. Consequently, recognizing text in such images (e.g., street signs, restaurant menus, license plates on cars, etc., often pose challenges for OCR applications since such images contain very few characters or words (i.e., not enough to estimate orientation from multiple parallel rows of text) and are often taken from an oblique viewing angle. Consequently, existing techniques adapted to rectifying large regions of text (e.g., a paragraph or a page) of rich texts, often have difficulty in working at the level of an individual character or with a short phrase or word. Consequently, the inability to rectify small amounts of text or characters degrades subsequent OCR accuracy results with respect to that text or characters.