Automatic systems purporting to recognize cursive script writing, or even handwritten characters, have so far met with only limited success. The reason for this limited success can be traced largely to the lack of robustness exhibited by the templates used in the modeling of handwriting. For example, reference is made to U.S. Pat. No. 4,731,857 to Tappert which describes an elastic matching approach for the recognition of run-on handwritten characters.
Tappert teaches three steps. First, potential segmentation points are derived. Second, all combinations of the segments that could reasonably be a character are sent to a character recognizer to obtain ranked choices and corresponding scores. Third, the character sequences are combined so that the best candidate word wins.
Tappert's recognition algorithm itself is a template matching algorithm based on dynamic programming. Each template is a fully formed character presumably representative of the writer's average way of forming this character, and the elastic matching scores of the current character are computed for each template. This strategy is vulnerable to the extensive variability that can be observed both across writers and across time.
In an article entitled "Design of a Neural Network Character Recognizer for a Touch Terminal" by Guyon et al, Pattern Recognition, a neural network is employed to classify and thereby recognize input characters. This results in a relatively robust algorithm but requires a large amount of data and is expensive to train.
A prior patent application entitled, "A Statistical Mixture Approach To Automatic Handwriting Recognition," filed by Bellegarda et al., on Oct. 31, 1991, (Ser. No. 07/785,642) now U.S. Pat. No. 5,343,537, issued Aug. 30, 1994, is directed to a fast algorithm for handwriting recognition having an acceptable degree of robustness. Bellegarda's prior application entails at least three considerations: (i) the feature elements should be chosen such as to characterize handwriting produced in a discrete, run-on, cursive, or unconstrained mode equally well; (ii) these feature elements should be suitably processed so as to minimize redundancy and thereby maximize the information represented on a per-parameter basis; and (iii) the resulting feature parameters should be further analyzed to detect broad trends in handwriting and enable appropriate modeling of these trends. These considerations are not met by the elastic matching approach taught by Tappert, since (i) it is character-based, and (ii) it simply averages several instances of a character to obtain a character template.
According to U.S. Pat. No. 5,343,537, the signal processing front-end is a great deal more sophisticated than that of elastic matching. Rather than merely chopping the input data into segments, the signal is transformed onto a higher dimensional feature space (chirographic space), whose points represent all raw observations after non-redundant feature extraction. Using a Gaussian (as opposed to a Euclidean) measure for a more refined clustering, the prototypes in this space are formed for robustness purposes. Hence, each prototype represents a small building block which may appear in many characters. Instead of character sequences, building block sequences are combined, each of which is assigned a true likelihood defined on a bona fide probability space (as opposed to just a distance score). Finally, the recognition algorithm itself is a maximum a posteriori (i.e. empirical) decoder operating on this probability space. The formulation described in Bellegarda's prior application may be alternatively cast in terms of multi-arc, single state, hidden Markov models (HMMs). This formulation, while being robust, may not adequately model the intra-character variation of the alphabet.
A second patent application entitled, "A Continuous Parameter Hidden Markov Model Approach to Automatic Handwriting Recognition", filed by J. Bellegarda et al., on Jan. 8, 1992, (Ser. No. 07/818,193) is directed to a computer implemented system and method for recognizing handwriting. This second application of Bellegarda et al. also entails at least three considerations: (i) for each character, identifying the different way of writing the character, also referred to as "allographs"; (ii) performing a training phase in order to generate a hidden Markov model (HMM) for each of the allographs; and (iii) performing a decoding phase to recognize handwritten text.
Bellegarda's second application discloses performing the training phase as follows. The system receives sample characters, wherein the sample characters are represented by training observation sequences. The system sorts the sample characters according to the allographs by mapping the sample characters onto a representational space, referred to as a lexographic space, to find high-level variations in the sample characters. It should be noted that the lexographic space is only marginally related to chirographic space. Specifically, the chirographic space is populated by frame-level feature vectors (i.e. the handwriting is chopped into small sub-character sections or "frames" and then vectors which mathematically represent the "frames" are created), while the lexographic space contains only character level feature vectors (i.e. the handwriting is chopped into whole characters and then vectors which mathematically represent the characters are created). As a result the lexographic space is more appropriate for finding the high level variations for characterizing allographs. This characterizing of allographs during training allows the system to create HMMs that mathematically represent each of the different ways an individual may write the same character (e.g. the letter "a" may look totally different depending on who is doing the writing and other variable factors). Once these models are generated they may be used for recognizing handwriting mapped into chirographic space. Accordingly, for each of the allographs, the system generates sequences of feature vectors for the sample characters associated with respective allographs by mapping in chirographic space. Next, the system generates a HMM for each of the allographs. The HMMs are generated by initializing model parameters and then updating the model parameters.
The system initializes the model parameters as follows. The system sets a length for each of the HMMs based on the average length of the sequences of feature vectors obtained for each allograph. Then, the system initializes state transition probabilities of the HMMs to be uniform. Next, the system assigns an output probabability distribution (for example, a mixture of Gaussian density distributions) to each of the states.
Bellegarda's second application discloses updating the model parameters by performing Viterbi alignment of the observation sequences in order to update the model parameters (that is, the output distributions, mixture coefficients, and state transition probabilities). The Viterbi algorithm is generally described in F. Jelinek, "The Development of an Experimental Discrete Dictation Recognizer", Proc. IEEE, Vol. 73, No. 11, Pages 1616-1623 (November 1985).
Finally, Bellegarda's second application discloses performing the decoding phase as follows. The system receives test characters to be decoded (that is, to be recognized). The system generates sequences of feature vectors for the test characters by mapping in chirographic space. For each of the test characters, the system computes probabilities that the test character can be generated by the HMMs. The system decodes the test character as the character associated with the HMM having the greatest probability.
The above approach provides excellent recognition performance for writer-dependent tasks. However, several considerations should be kept in mind when considering writer-independent tasks. First, the chirographic prototypes used in the derivation of the HMM parameters typically vary substantially from writer to writer. Second, parameter tying across different HMMs typically will vary significantly from writer to writer. This means that there is no single label alphabet from which to draw all potential elementary units for all writers, which in turn makes it difficult to (a) compare across writers the HMMs generated for a given character and, (b) to generate a good set of writer-independent HMMs.
A related consideration is that, if insufficient data has been observed for a particular allograph, there is no way to make the parameter estimates more reliable by considering supplemental data from additional writers.
Another consideration is that no supervision is enforced while searching for a partition of chirographic space. Supervision indicates whether or not there is any monitoring of the process of prototype building in the training phase. This is important because, if there is no supervision, then even for a single writer there may be no explicit relationship between a character or allograph model and its manifestation in chirographic space.
A related consideration is that, without supervision, the risk of performing the training phase in an inefficient manner (i.e. not cost-effectively) is high because there is no mechanism to monitor, and therefore adjust, the training activity.
A general problem of prior handwriting recognition systems in the area of cost effectiveness is related to inefficient methods of decoding the handwriting. This is especially critical when the handwriting to be recognized may belong to any one of a number of writers.