Initial approaches to recognize handwritten text have primarily concentrated on criteria to match the input pattern against reference patterns, and to select that reference symbol which best matched the input pattern as the recognized pattern. This family of matching criteria is often called "shape matching". A wide variety of shape-based recognition systems are known in the art. Representatives of such shape-based recognition systems are described in U.S. Pat. Nos. 3,930,229; 4,284,975 and 4,653,107.
Common to all the shape-based systems is that irrespective of the matching criteria used, the information is processed locally, and that such systems are not very accurate in resolving ambiguities among reference patterns which exhibit shape similarity.
The use of character context or linguistic rules as additional information for recognizing characters has been extensive in a variety of fields, such as cryptography, natural language processing, etc. Systems introducing global context and syntax criteria have been offered for improving shape-based recognition, in order to distinguish among members of a "confusion set." A system representative of this approach is described in U.S. Pat. No. 4,754,489. The system of this patent uses conditioned probabilities of English characters appearing after a given character sequence, and probability of groups of English characters to suggest syntax rules.
Another representative system is described in U.S. Pat. No. 4,718,102, which is directed to ideogram-based languages such as Kanji, in which a shape-based algorithm producing a confusion set is disambiguated by simulating human experience. The disambiguating routines are based on actual studies of particular characters.
It appears that the approaches taken heretobefore resulted in systems suffering from the following problems: They are not sufficiently general to cover different languages; require computationally prohibitive time and memory resources; do not include the shape information in a statistical, meaningful fashion; are not adaptive in texts of varying linguistic and syntax content; and operate as post-processes, not contributing to the segmentation of input patterns.