The Chinese and Japanese languages use ideographic scripts, where there are several thousand characters. This large number of characters makes the entry by a typical computer keyboard of a character into a computer system cumbersome and slow. A more natural way of entering ideographic characters into a computer system would be to use handwriting recognition, and particularly automatic recognition of cursive style handwriting in a “on-line” manner. However, prior on-line handwriting recognition methods have concentrated on print style handwritten ideographic characters; the requirement that the handwriting be printed is still too slow for a typical user of a computer system. These prior methods have not been successful at adapting to on-line cursive style handwriting character recognition.
The complexity of the ideographic characters and the character distortion due to non-linear shifting and multiple styles of writing also makes character recognition difficult, particularly for on-line systems.
One method which has been used extensively to deal with the types of problems arising from ideographic character recognition is hidden Markov modeling (HMM). HMMs can deal with the problems of segmentation, nonlinear shifting and multiple representation of patterns and have been used extensively in speech and more recently character recognition. See, for example, K. Lee “Automatic Speech Recognition; The Development of The SPHINX System”, Kluwer, Boston, 1989.; Nag, R., et al. “Script Recognition Using Hidden Markov Models”, Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 2071-2074, 1986; and Jeng, B., et al., “On The Use Of Discrete State Markov Process for Chinese Character Recognition”, SPIE, vol. 1360, Visual Communications and Image Processing '90, pp. 1663-1670, (1990). Jeng used HMMs for off-line recognition of printed Chinese characters. In this system described by Jeng, one HMM is used for every Chinese character, and the HMMs are of fixed topology. The limitations of this approach are that the system can only recognize printed Chinese characters and not cursively written characters. This recognition system also requires a large amount of memory to store the thousands of character level Markov models. Another disadvantage of the system is that a fixed topology is used for every character and the number of states for a character's hidden Markov model does not depend on the complexity of the character.
In ideographic languages, such as Chinese, the thousands of ideographic characters can be broken down into a smaller set of a few hundred subcharacters (also referred to as radicals). There are several well know dictionaries which define recognized radicals in the various ideographic languages. Thus, the thousands of ideographic characters may be represented by a smaller subset of the subcharacters or radicals. See, Ng, T. M. and Low, H. B., “Semiautomatic Decomposition and Partial Ordering of Chinese Radicals”, Proceedings of the International Conference on Chinese Computeing, pp. 250-254 (1988). Ng and Low designed a semiautomatic method for defining Chinese radicals. However, these radicals are not suitable for on-line handwriting character recognition using hidden Markov models for several reasons. First, to perform on-line character recognition using radical HMMs, a character model based on several radical HMMs should be formed from a time sequence of subcharacters, which was not done by Ng and Low. Secondly, Ng and Low break down the characters into four basic constructs or categories of radicals; vertical division; horizontal division; encapsulation and superimposition, and a radical as defined by Ng and Low can appear in more than one of these categories. This has the effect of having up to four different shapes and sizes for the radical and this will have a detrimental effect on the hidden Markov modeling accuracy because the model has to deal with up to four different basic patterns for the four categories.
While the use of subcharacters or radicals to recognize ideographic characters is in some ways desirable, it does not always accurately recognize characters without also recognizing the geometric layout of the subcharacters relative to each other in a character. In a prior approach by Lyon, the use of a size and placement model for subcharacters in a ideographic script has been suggested. See, U.S. patent application Ser. No. 08/315,886, filed Sep. 30, 1994 by Richard F. Lyon, entitled “System and Method for Word Recognition Using Size and Placement Models.” This method uses the relationship between sequential pairs of subcharacters in a character to create a size and placement model. The subcharacter pair models are created by finding the covariance between bounding box features of subcharacter pairs. This method relies on the pen lift which occurs between subcharacters of ideographic characters and thus is only useful for printed ideographic characters and cannot be used for cursively written ideographic characters where there is usually no pen lift between characters.
Thus the prior art while providing certain benefits for handwriting recognition does not efficiently recognize cursively written ideographic characters in an on-line manner (for example, in an interactive manner). Moreover, the use of an HMM for a radical having various categories has a detrimental effect upon the accuracy of the HMM procedures. Thus it is desirable to provide improved on-line recognition of cursive handwriting for ideographic scripts.