The present invention relates to an English character recognition system and method, and more particularly to a system and method for recognizing, in an on-line state, received handwritten cursive script English characters using a circular hidden Markov model.
Generally, in character recognition processes, there are on-line recognition methods and off-line recognition methods. It is our observation that many of these methods are, due to their intrinsic characteristics, unsatisfactory in particular aspects.
In the off-line recognition method, previously written characters are read through a scanner to be stored in a predetermined region, and then are recognized by thinning or by a boundary extracting process. A detailed description of this type of method can be found in Korean Patent Application No. 90-18517.
In the on-line recognition method, a tablet is connected to a computer, thereby enabling the use of electronic ink. The electronic ink indicates the result of recognition by printing or on the monitor by internally processing the result obtained by movement of an electronic pen upon the tablet. This is essential in an on-line handwriting recognition method. Accordingly, this method can recognize characters, including cursive characters, manually written on the tablet with an electronic pen by a human hand in real time.
Nowadays, with the development of graphic tablet techniques, the computer having powerful computation capabilities, and of a pen computer, a growing interest in the on-line character recognition method is being seen. On-line English recognition systems have been developed and have been put on the market, partly at universities, laboratories and companies. Most of them recognize only precisely handwritten characters. That is, they can not recognize cursive handwriting. Accordingly, such systems do not provide convenient and unconstrained handwriting recognition for users because of the problem of character segmentation, i.e. segmentation of an input word into characters. Also, another problem arises from the handling of mixed character styles, because very different shapes are used for some characters according to the varied writing style of users.
In the on-line character recognition method, hidden Markov models (hereinafter, referred to as "HMMs") are sometimes used. In HMMs, sequential information is modeled according to time. The characteristic of the HMM lies in its outstanding modeling power.
The HMM technique has been often applied to speech recognition fields, as disclosed in U.S. Pat. No. 4,819,271 to Bahl et al. This obtains a better result than a matching method using a conventional dynamic program as disclosed in the IEEE, Vol. 77, No. 2, Feb. 1989, pp. 257-285 by L. R. Rabiner. Since character information X, Y of on-line character data is also sequentially generated according to time, like sound signals, the above technique can be easily used in a character recognition method.
Among the several examples using the HMM technique for character recognition include a method disclosed in Script Recognition Using Hidden Markov Models by R. Nag, K. H. Wong and F. Fallside published in IEEE ASSP pp. 2071-2074, 1986. In this recognition method, cursive English input data is convened into short vectors having the same length, and is HMM-trained by extracting quantized angle changes and several other characteristics. From this information, recognition of word units is performed. In this method, the recognition rate of the handwriting of one specific person is high, but the rate for untrained handwriting by another person is low. Accordingly, there is a problem in that this method can not be commonly used for recognition of the handwriting of many persons.
Another character recognition method is disclosed in A Hierarchical System For Character Recognition With Stochastic Knowledge Representation issued in 1988 by J. A. Vlantzos and S. Y. Kung, pp. 1-601.about.1-608. This is an example of using HMM in off-line English language recognition. The characteristic of this recognition method is a hierarchical structure of three levels: sentence level, word level and letter level. In character recognition, recognition in the character unit mode is performed by scoring a character level model. Words or sentences are recognized by finding optimal state sequences in an upper level.
A. Kundu and P. Bahl disclose off-line cursive English recognition in the IEEE Proceedings of International Conference on Acoustics, Speech, and Signal Processing, 1988, pages 928-931. For this type of off-line English language recognition, one HMM composed of twenty-six states is first constructed, and state transition probabilities are obtained using statistical values obtained from English text. After optimal code-book symbol states of input data are found using an action quantization VQ algorithm (i.e., a "VQ"), output probabilities are obtained by searching distributions shown in the respective states. Then, inputs in word units are divided into character units, and code-book symbols nearest to the respective characters are found. Next, the symbol sequence is fed into a HMM, and the best state sequence is found using a Viterbi algorithm, thereby enabling recognition of a word. This method however, does not show a method of dividing a word into characters, and therefore has a substantially lower recognition efficiency.
Off-line And On-line Methods For Cursive Handwriting Recognition, by J. Camillerapp, et al. is published in "Proceedings of Interbational Workshop on Frontiers in Handwriting Recognition, Chateau de Bonas, France, 1991. This paper discloses that a MM (Markov Model) is applied to on-line cursive English character recognition. Input data is converted into chain code vectors of predetermined magnitude, from which model word units are trained. Then, a recognition experiment is performed. This experiment is done with first-order MMs, second-order MMs and third-order MMs, among which third order MMs show the best results. A recognition rate of 66% can be obtained for fifty five words.
In most of the conventional character recognition methods, however, inaccurate character separation adversely affects character recognition, and handling for connection between adjacent characters in extended cursive style is not performed, thereby reducing the recognition rate.