Optical character recognition (OCR) technology is a well known method for converting paper documents into digitized form. Basically, a document is scanned by a commercially available scanner to produce a raster-image. The raster-image is passed to commercially available software, an optical character recognition (OCR) engine, where a corresponding character recognition algorithm processes the scanned raster-image to recognize characters which include numerical digits and some special characters such as "&", "$" and "#", for example.
One of the main problems of conventional OCR technology is that the accuracy of recognizing characters is limited. Some OCR engines can accurately recognize characters from some character environments but perform poorly in other types of character environments. For example, a first OCR engine may be able to recognize Helvetica style characters with a ninety percent accuracy rate. However, the first OCR engine may only be able to recognize Palatino style characters with a fifty percent accuracy rate. A second OCR engine may provide accurate results for Helvetica characters but not for Courier style characters. A third OCR engine may perform better for 10 point Courier characters than for 18 point Courier characters. Therefore, if one page of a document contains Courier and Helvetica style characters in 10 and 18 points, using only one of the OCR engines will produce less than adequate results because none of the OCR engines can optimally recognize characters in all different types of character environments.
There is a significant need in optical character recognition to provide a method which combines the best optical character recognition features of each of the OCR software engines to identify and resolve erroneous characters from the many different types of character environments.