1. Field of the Invention
This invention relates to character recognition systems.
2. Description of Related Art
Character recognition, such as optical character recognition (OCR), involves scanning of documents and automated recognition of machine-printed or handwritten characters. FIG. 1 is a block diagram of a typical prior art OCR system. Documents 1 are passed through a scanner 2 which generates image data 3. The image data 3 is applied to a processor 4 (such as a general purpose computer) suitably programmed with character recognition computer programs (while most current OCR systems are software based, an equivalent system can be implemented completely in hardware).
The processor 4 produces a set of characters (typically coded in ASCII) of some or all of the scanned document as output. The character recognition system has to locate fields of interest (which may be the whole document, as in the case of typed pages) in the scanned image data 3, extract individual characters from the fields of interest, recognize these characters, and produce codes for each of the recognized characters.
Real-world images and characters frequently suffer from a number of degradations (such as worn out typewriter ribbons, missing pins in dot-matrix printers, skipping ballpoint pens, poor quality handwriting, deficiencies in the scanning process, etc.). Accordingly, a character recognition system must be able to provide a degree of confidence in its results to be of practical use. This degree of confidence can relate to the recognized document or fields in the document, but certainly must be present for individual recognized characters. With ambiguous or noisy characters, an OCR system can assign several potential identities to the image data comprising a character. These identities are usually rank ordered by a confidence factor, so that the most probable identity of the character has the highest confidence, the next most probable identity has the next highest confidence, etc.
Traditionally, OCR programs have been designed and utilized as single pass, single classifier systems, an example of which is shown in FIG. 2. Image data 3 is applied to a "universal" classifier 5 which outputs machine-readable data, typically in ASCII 6 form. A universal classifier is designed to recognize a large set of characters such as letters, numbers, or alphanumeric characters. A drawback of single pass, single classifier systems is that recognition frequently fails when the classifier is confronted with ambiguous characters (e.g., "I", "l", and "1") or "noisy" (i.e., poorly formed) characters.
More recently, OCR systems have utilized multiple universal classifiers in conjunction with a "voting" algorithm to select the output of one of the classifiers. FIG. 3 is a block diagram of a prior art multiple universal classifier system, in which image data 3 is applied to some or all of n universal classifiers 5, the outputs of which are coupled to a voting function 7. The universal classifiers 5 are trained for different characteristics or use different recognition algorithms. The voting function 7 may be any one of several algorithms which compare or combine the outputs of the universal classifiers 5 to arrive at a (presumably) more reliable character recognition. The voting function 7 then outputs a character code 6. While multiple universal classifier systems give improved recognition compared to single pass, single classifier systems, such systems would benefit from further improvements.
Accordingly, the inventor has recognized a need for a better character recognition system. The present invention meets this need.