1. Field of the Invention
This invention relates generally to the optical recognition of text, and more particularly, includes a method for enhancing the accuracy of character recognition in optical recognition systems.
2. Description of Related Technology
Optical text recognition algorithms perform character segmentation and subsequent recognition of the segmented characters, using neural networks, pattern recognition, and/or fuzzy logic. The algorithms convert the segmented portions of a bitmap image created from text to elements of a known character set, such as ASCII. Optical character recognition (OCR) algorithms, one form of optical text recognition algorithm, typically convert bitmap images of machine printed text, while intelligent character recognition (ICR) algorithms, another form of optical text recognition algorithm, typically convert bitmap images of hand printed text. One of the major factors affecting the accuracy of text recognition is the quality of the bitmap image. Common bitmap image enhancement methods perform such functions as smoothing, sharpening, edge detection and noise removal. All of these functions are designed to improve visual perception of the image by the human eye. Other special classes of image enhancement algorithms, referred to as OCR preprocessing algorithms, also improve the quality of the digitized text for subsequent recognition. These algorithms perform such functions as line removal, deskewing, color processing, and rotation for better segmentation of text, noise removal, thinning, thickening, and stroke restoration so as to modify the individual text characters into a more visually identifiable form. Recognition is typically increased in establishing characters that are more visually identifiable. Under certain conditions, however, either the triggers for invoking the preprocessing algorithms or image enhancement methods are not initiated or the goal is not conducive to good recognition. This can occur when a faulty printer head is used to print a document. The printer head may not put down ink over a portion of a character. If the document is subsequently digitized, it will show breaks in the character stroke of the bitmap image. This can also occur when a scanning device head is damaged or interfered with. When this situation occurs, an individual character may be seen as two or more separate characters or a single character stroke may be seen as two or more strokes.