A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document of the patent disclosure, as it appears i the Patent and Trademark Office patent file or records, but otherwise reserves all copyrights whatsoever.
1. Technical Field
The invention is related to high-speed optical character recognition systems and particularly to optical character recognition systems useful for reading magnetic image character recognition (MICR) symbols on a personal bank check.
2. Background Art
Identifying the image of an unknown pattern by matching it with a set of known reference patterns is a well-known technique, as disclosed in U.S. Pat. No. 3,165,718 (to Fleisher), for example. This type of technique is used in optical character recognition (OCR) systems. In one version, commonly referred to as feature-based optical character recognition, the unknown image is treated as a vector and the known reference patterns are likewise treated as reference vectors. Recognition is performed by associating the unknown vector with the one reference vector having the shortest absolute distance to the unknown vector. This technique is disclosed in U.S. Pat. No. 4,783,830 (to Johnson et al.) and similar techniques are disclosed in U.S. Pat. No. 3,522,586 (to Kiji et al.) and U.S. Pat. No. 3,382,482 (to Greenly). The use of multi-dimensional vector spaces in OCR systems is disclosed in U.S. Pat. No. 4,733,099 (to Bokser).
Another optical character recognition technique, commonly referred to as template matching, associates the unknown image with one of a set of known reference templates having the greatest similarity. Similarity is determined, for example, by the number of matching "on" and "off" pixels in the unknown image and the reference template. Template matching is disclosed in U.S. Pat. No. 4,288,781 (to Sellner et al.) and similar techniques are disclosed in U.S. Pat. No. 4,545,070 (to Miyagawa et al.), U.S. Pat. No. 4,454,610 (to Sziklai) and U.S. Pat. No. 4,837,842 (to Holt).
The computation of a confidence level to determine the validity of the character identification provided by such techniques is disclosed in U.S. Patent No. 4,288,781 (to Sellner et al.) discussed above. The confidence level defined in the latter patent is the ratio of the scores of the two highest scoring symbol identifications. Computation of confidence levels or values is also suggested in U.S. Pat. No. 4,733,099 (to Bokser) referred to above and U.S. Pat. No. 4,523,330 (to Cain). The latter patent suggests substituting an alternative manual character identification method if the confidence level is below a predetermined threshold.
U.S. Pat. No. 4,710,822 (to Matsunawa) discloses an image discrimination method based upon histogramming the pattern of the density of image elements in blocks into which the image has been divided. U.S. Pat. No. 4,833,722 (to Roger Morton et al.) discloses how to detect edges or boundaries in document images.
The feature-based OCR technique is superior to the template matching OCR technique because it is much faster. However, the feature-based OCR technique can be somewhat less reliable by failing to identify unknown character images more frequently than the template matching technique. Thus, the template matching technique is more reliable because it can identify unknown character images which the feature-based technique cannot. Accordingly, it has seemed that an OCR system could not enjoy the benefits of both speed and reliability because the system designer had to choose between a feature-based OCR system or a template matching OCR system.
One problem with template matching OCR systems is that the character stroke thickness of the reference templates affects the system performance. For example, a reference template with a thin character stroke is easier to match if pixels outside of the template character strokes are disregarded. This can lead to more than one symbol identification of an unknown character, or "aliasing". This problem is solved by considering the pixels outside of the reference template character strokes during the matching process, and demanding that these pixels be "off" in the unknown image. However, counting the outside pixels reduces the reliability of the template matching process, excluding unknown images which do not match simply because of the presence of noise in the image.
This problem is particularly severe if such an OCR system is employed to automatically read the machine-readable or MICR characters on a personal bank check. Such systems must perform flawlessly and must work at high speeds to keep pace with daily volume demands typical of most banking systems. In such systems, handwritten strokes through the MIRR characters are not uncommon and constitute sufficient noise to render the MICR characters unreadable using most conventional OCR techniques.