This invention is related to the field of character recognition and specifically to systems for recognizing run-on handwritten characters. In recognition systems utilizing handwritten characters as input, an important consideration is the recognition of individual letters which may be formed using segments which run-on neighboring letters. A major problem in such recognition systems is the segmentation of the words into characters which may be suitable for a recognition algorithm which operates on characters. Techniques of whole word recognition are possible but highly impractical.
Run-on handwritten characters are handwritten characters which are run together, that is, they can touch or overlap one another. Although only adjacent characters normally display this touching or overlapping, more distant characters can also take on this characteristic. For example, a t-crossing can sometimes overlap several adjacent characters. Handprinting often has this run-on characteristic and is normally characterized by both touching and overlapping characters. Cursive script writing can also be considered run-on handwriting. The individual characters can be considered to touch through their connections by ligatures. Furthermore, the characters in cursive writing can overlap. As with handprinting such overlap usually involves adjacent characters and again t is an example of where this occurs with relatively high frequency.
The reason run-on handwriting is difficult for automatic recognition procedures is that the characters cannot be easily or accurately separated one from another. In order to segment the characters from each other some recognition is required, and in order to recognize the characters some segmentation is required. Therefore, these two processes of segmentation and recognition are not independent but are heavily interrelated. Nevertheless, prior work in this area was directed toward the development of essentially separate procedures for the segmentation and subsequent recognition of the characters. This is referred to as the segmentation-then-recognition approach.
Reference is made to U.S. Pat. No. 3,111,646 relating to a method and apparatus for reading cursive script. The algorithm and hardware for implementation presupposes that the writing itself be well-proportioned. Consequently, the segmentation algorithm is unduly restrictive. Specifically, the algorithm determines various zones by taking the overall height of an entire line from the base of the descenders to the peaks of the ascenders and divides the heights as shown in FIGS. 1 and 2 into four parts for use in the recognition system. Consequently, the algorithm requires that the input be very well proportioned and that the writing not slope or deviate from the base line. Handwriting, however, is highly variable; in practice, the heights of ascenders and descenders are simply matters of personal style and subject to nearly infinite variations. The absolute length of an ascender or descender is not of great importance to humans in their handwriting styles. Consequently, the definition of well-proportioned handwriting is not a trait found in typical handwriting samples. The segmentation algorithm of U.S. Pat. No. 3,111,646 is retrospective. In operation, an entire line of script is fed into storage registers and segmentation points are determined by first determining the average letter width based principally on the number of zero axis crossings.
This approach has a number of drawbacks in addition to the restrictions on input script. For example, the technique requires that the input be very well proportioned, that is that the writer maintain substantially the same letter width throughout. Secondly, the technique is not sequential. The prior art first segments the characters constituting the line then attempts recognition. For a practical system, it is important that segmentation and recognition be done in "real time". That is, an operative system should display the results as soon as possible after the character is formed.
Segmentation in itself is an extremely difficult problem when dealing with connected written script. The form of the ligature depends not only on the two characters being joined, the overall context, but additional general factors such as fatigue and physical conditions under which writing is done, that is sitting, standing or the like which may change the handwriting of an individual. The prior art contains examples of segmentation schemes for cursive script.
Reference is made to U.S. Pat. Nos. 3,334,399, 3,305,832 and 4,024,500 all concerned with techniques of character segmentation in cursive script handwriting. Those systems are all predicated on ligatures being defined in essentially a continuum of the characters.
IBM Technical Disclosure Bulletin, Vol. 24, No. 6, pages 2897-2902, describes a system for recognizing discretely written characters based on elastic matching of an unknown character against a set of character prototypes. The input to the system is point data produced by a dynamic trace of a stylus on an electronic tablet. The hardware configuration is shown in FIG. 1 of that TDB. Processing in accordance with this elastic matching system is performed on a character by character basis after the writing is separated into characters. The assumption of the recognition technique disclosed is that the characters are written with sufficient space between them to allow separation prior to recognition. Consequently, the algorithm described is a segmentation-then-recognition approach. Decoding utilizing this scheme cannot be accomplished where the characters run together. Other techniques utilizing segmentation followed by recognition are typified by U.S. Pat. Nos. 3,713,100 and 3,784,982.