Techniques for the machine recognition of printed and handwritten characters (i.e., digits and non-digits alike) have existed for several years. For the most part, however, techniques for recognition of machine printed characters are not suitable for application to handwritten characters. This is because the recognition of handwritten characters presents several problems not encountered in the recognition of machine printed characters. First, handwritten characters, unlike those printed by a machine, often are not of uniform size within a block of text, or even within the same word. Thus, heuristic techniques for recognizing machine printed characters which rely to varying degrees on relative sizes of characters within a string are ill-suited for use in the recognition of unconstrained handwritten text. Second, letters and/or numbers constituting a string of handwritten text often overlap with one another, making character identification techniques based upon the recognition of discrete characters useless. Further, various handwriting styles often result in characters having disconnected components--another situation machine printed text recognition techniques are not equipped to handle. Finally, the abundance of different handwriting styles renders the recognition of handwritten words a computationally more substantial task than machine printed text recognition.
Various methods for the recognition of handwritten characters have been proposed, for instance, in E. Lecolinet and J. Moreau, "A New System for Automatic Segmentation & Recognition of Unconstrained Handwritten ZIP Codes," The 6th Scandinavian Conference on Image Analysis, June 1989; and in M. Sridhar and A. Badreldin, "Recognition of Isolated and Simply Connected Handwritten Numerals", Pattern Recognition 19:1-12, 1986. These methods are not equipped to deal with overlapping characters within a string of characters having disconnected components, because none is able to isolate overlapping characters for recognition when the strings in which they occur are of varying length. In order to operate effectively in the recognition of unconstrained handwritten text, i.e., handwritten text not confined to a particular location or size, the recognition method must be capable of discerning the identity of individual characters despite their being connected to other characters or having disconnected components.
Another method for recognizing handwritten text involves a method for recognizing strings of numbers. See R. Fenrich and S. Krishnamoorthy, "Segmenting Diverse Quality Handwritten Digit Strings in Near Real Time," United States Postal Service Advanced Technology Conference, 1990. This method was designed particularly for use in the recognition of handwritten numbers in ZIP Codes on mail pieces. Because the number of digits in any ZIP Code necessarily is five or nine, the method begins with the assumption that the string contains either five or nine characters. By making this assumption, the aspect ratio of the character string (i.e., the ratio of string length, in pixels, to string height, in pixels) can provide sufficient information for the determination of whether the string has one or the other of the two possible string lengths. The method does not handle strings of varying length, such as the street address on a mail piece or the courtesy amount on a check. Aspect ratio is much less useful for determining word length in such situations. Further, the presence of punctuation marks in street address lines can cause inaccurate estimates of word length.