The present invention relates generally to character and pattern recognition machines and methods, and more particularly, to feature extraction systems for use with optical readers for reading characters which have been hand printed without any constraints, such as, surrounding box limits, red center lines, or similar artifical devices. One novel feature of this invention is in the method of choosing the features and the highly normalized method of measuring the individual feature parameters. The invention can be said to perform a crude simulation of a little known psychological phenomenon occuring in primates called the "saccadic flick".
The present invention also relates generally to bank check, drafts and like financial document processing machines and method incorporating character and pattern recognition systems and, more particularly, to systems for reading numeric characters and symbols (e.g., "xx", fraction lines, etc.) and recognizing dollars and cents in the courtesy amount field (CAF) of a bank check, draft and like business documents and which have been typed or printed, particularly hand printed without any constraints, such as surrounding box limits, red center lines, or similar artificial devices.
While there are generally different views on the definition of the features of patterns, many studies made on the recognition of characters as well as the recognition of patterns have proved that the so-called quasi-topological features of a character or pattern such as the concavity, loop, and connectivity are very important for the recognition. To date, many different methods have been proposed for the purpose of extracting such quasi-phasic features. Up until this invention these methods all use analysis of the progressive slopes of the black pixels. Mori et al. U.S. Pat No. 4,468,808 classifies those analyses into three types. The first is the pattern contour tracking system developed by Grenias with IBM. Mori calls this a serial system. The second type is Mori's preferred, the earliest patented example of which is Holt called the "Watchbird". In this type of analysis sequential rows and columns are compared. Another example of the sequential rows and column type is Holt's Center Referrenced Using Red Line. Mori's third type is a parallel analysis system which Mori dismisses as either taking too long or costing too much. All systems involving the sequential analysis of the slope of black pixel groups suffer severely from smoothing and line thinning errors. Worse yet, they are very likely to produce substitution errors when the lines have voids or when unwanted lines touch. A comprehensive survey of prior art handprint recognition systems is found in an article by C. Y. Suen et al. entitled "Automatic Recognition of Handprinted Characters--The State of the Art", Proceeedings of the IEEE, Vol. 68, No. 4, Apr. 1980, which is incorporated herein by reference. The preferred handprint character recognition technique of this invention uses none of the methods mentioned by Suen et al. or Mori et al.
The character recognition system of the present invention, while using quasi-topological features, employs a novel method of measuring and scoring such features, resulting in great improvement in performance of the reading machine.
Briefly, the character recognition system of this invention employs measurement of the enclosure characteristics of each white pixel independently of other white pixels. Since the measurements are made in two (or more) dimensions rather than in one dimension (such as slope), the results are insensitive to first order aberations such as accidental voids, touching lines and small numbers of black pixels carrying noise only. In the preferred embodiment, no noise processing is performed at all since all forms of noise processing are done at the expense of accuracy in recognition. As used herein, a pixel is defined as an image information cell constituted by the binary states "on" and "off" or "black" and "white", respectively.
The financial document processing portion of this invention locates the courtesy amount field (CAF) bank check and then locates the division between the dollars portion of the CAF and the cents portion and then reads the dollar and cents amounts. Overlapping characters, overlapping and touching characters, symbols (e.g., "xx") "100" and characters touching the fraction line, in the CAF and treated as a character unit.