Optical character recognition (OCR) is a process for automatically identifying handwritten or printed characters for the purpose of providing electronic identification of the characters to communication, computer, display or data processing systems. OCR techniques are particularly useful where there are voluminous amounts of printed input data, as encountered for example, by banks, insurance companies, brokerage houses, mail and postal systems, etc. For a review of character recognition methods, see V. K. Govindan and A. P. Shivaprasad, "Character Recognition--A Review," Pattern Recognition, Vol. 23, No. 7, pp. 671-683, 1990 and S. Mori et al., "Historical Review of OCR Research and Development," Proc. IEEE, Vol. 80, No. 7, pp. 1029-1058, July 1992. One commercially available OCR system is ScanWorX by Xerox Imaging Systems.
Present OCR systems can achieve high levels of accuracy in identifying characters when working on clean or non-degraded text. However, it is well known that the performance of these systems deteriorates rapidly when the text is degraded, as for example when the characters are blurred and connected. This degradation can occur, for example, by making successive copies of a document on a photocopier. Although often the level of reparation does not look extreme to the human eye, it can cause catastrophic failure of OCR systems. Thus, there is a need for techniques to improve OCR capabilities.