The reference to any prior art in this specification is not, and should not be taken as, an acknowledgment or any form of suggestion that the prior art forms part of the common general knowledge.
One of the major issues faced in the development of highly accurate handwriting recognition systems is the inherent ambiguity of handwriting. Humans depend on contextual knowledge to correctly decode handwritten text. As a result, a large amount of research has been directed at applying syntactic and linguistic constraints to handwritten text recognition. Similar work has been performed in the field of speech recognition, natural language processing, and machine translation.
In a handwriting recognition system, the fundamental language primitive is a character. While some recognition systems bypass character recognition altogether (known as holistic word recognition) most recognition systems make some attempt to identify individual characters in the input signal. Systems that do not do this are overly dependent on dictionaries during recognition, and support for the recognition of out-of-vocabulary words (i.e. words not in the dictionary) is usually not available.
In systems that do utilise character recognition, the raw output of a character classifier inevitably contains recognition errors due to the inherent ambiguity of handwriting. As a result, some kind of language-based post-processing is generally required to resolve the real meaning of the input.
Many systems include simple heuristics that define a set of language rules for handwritten text. Thus, for example, capital letters are most often found at the start of words (as a counter-example, “MacDonald”), most strings are usually all letters or all numbers (as a counter-example, “2nd”) and rules that define the likely position of punctuation characters within a word. However, these heuristics are time-consuming and difficult to define, fragile to change, and are usually incomplete.
In addition to the above heuristics, some recognition systems include a character N-gram model. An example of this is described in H. Beigi and T. Fujisaki, “A Character Level Predictive Language Model and Its Application to Handwriting Recognition”, Proceedings of the Canadian Conference on Electrical and Computer Engineering, Toronto, Canada, Sep. 13-16, 1992, Vol. I, pp. WA1.27.1-4.
In particular, these systems utilise language models defining the probability of observing a certain character given a sequence of previous characters. For example, the letter ‘e’ is much more likely to follow ‘th’ than the letter ‘q’. That is, P(e|th) is much greater than P(q|th). Character N-grams can be easily derived from a text corpus and are a powerful technique in improving character recognition without constraining the writer to a specific list of words.
Even so, with large numbers of letter combinations provided in a given language, the use of such systems is limited, and requires very data intensive processing, thereby limiting the range of applications of the technique.
Furthermore, in some situations, the recognition system is expecting a certain format for the input (for example, U.S. Zip codes, phone numbers, street addresses, etc.) In these cases, the use of regular expressions, simple language templates, and constrained character sets can be used to increase recognition accuracy. However, the use of these techniques is limited to circumstances in which strict adherence to limited formats is provided. Thus, for example, the technique will only apply to the post codes, or the like, for which the system is trained and will not apply to general handwritten text.
Handwritten text also exhibits ambiguity not only at the character level, but also at the word level, particularly in cursive writing. Recognition systems address this issue by including word-based language models, the most common of which is the use of a pre-defined dictionary.
Word N-grams, which are similar to character N-grams but define transition probabilities between sequences of words rather than characters, can be used for the post-processing of written text. To avoid the combinatorial memory and processing requirements for large-vocabulary word N-grams, some systems use word-class N-grams, where the transition probabilities are defined for the part-of-speech tag of the word (e.g. noun or verb) rather than for individual words.
Other systems use Markov models of syntax for word disambiguation. An example of this is described in D. Tugwell, “A Markov Model of Syntax”, Paper presented at the 1st CLUK Colloquium, University of Sunderland, UK 1998.
Another approach to word modelling is the identification of word collocations, sequences of two or more words that have the characteristics of a syntactic or semantic unit, as described for example in C. Manning and H. Schutze, “Foundations of Statistical Natural Language Processing”, The MIT Press, Cambridge, Mass., US 1999.
However, again, the use of language post processing is data intensive, thereby limiting the applications in which the techniques may be applied.
Examples of some the techniques outlined above will now be described in more detail.
H. Beigi and T. Fujisaki describe in “A Flexible Template Language Model and its Application to Handwriting Recognition”, Proceedings of the Canadian Conference on Electrical and Computer Engineering, Toronto, Canada, Sep. 13-16, 1992, Vol. I, pp. WA1.28.1-4, a generic template language model for use in situations that “are very limited in format or their vocabulary”. In this case, templates are applied by integrating an elastic-matching character-classification score with a model probability using a search heuristic. The use of an N-gram character model used to estimate the probability of a character based on the previous N−1 characters is also described.
In this system, “the set of characters which are supported in the N-gram character predictor is a-z plus space”, as described in more detail in H. Beigi and T. Fujisaki, “A Character Level Predictive Language Model and Its Application to Handwriting Recognition”, Proceedings of the Canadian Conference on Electrical and Computer Engineering, Toronto, Canada, Sep. 13-16, 1992, Vol. I, pp. WA1.27.1-4.
Furthermore, in H. Beigi, “Character Prediction for On-Line Handwriting Recognition”, Canadian Conference on Electrical and Computer Engineering, IEEE, Toronto, Canada, September 1992, it is described that “N=4 is shown to be optimal for practical on-line handwriting recognition”.
Similarly, J. Pitrelli and E. Ratzlaff, describe in “Quantifying the Contribution of Language Modeling to Writer-Independent On-line Handwriting Recognition”, Proceedings of the Seventh International Workshop on Frontiers in Handwriting Recognition, Sep. 11-13 2000, Amsterdam, the use of character N-grams and word N-grams in a Hidden Markov Model (HMM) cursive handwriting recognition system.
A word unigram and bigram language model derived from a corpus to perform holistic word recognition of handwritten text is described in U. Marti and H. Bunke, “Handwritten Sentence Recognition”, Proceedings of the 15th International Conference on Pattern Recognition, Barcelona, Spain, 2000, Volume 3, pages 467-470. In this case, the Viterbi algorithm uses classifier scores and word probabilities to decode input text sentences.
Bouchaffra et al describe the use of non-stationary Markov models as a post-processing step in the recognition of U.S. Zip codes in “Post processing of Recognized Strings Using Non-stationary Markovian Models”, IEEE Transactions Pattern Analysis and Machine Intelligence, 21(10), October 1999, pp 990-999. In this case, domain-specific knowledge that Zip codes have a fixed length, and each digit in the code has a specific physical meaning is used to aid recognition. In particular, using a training set of Zip codes provided by the United States Postal Service, transition probabilities for each digit at each point in the digit string were computed, with this knowledge being applied to improve recognition performance.
L. Yaeger, B. Webb, and R. Lyon, “Combining Neural Networks and Context-Driven Search for On-Line, Printed Handwriting Recognition in the Newton”, AI Magazine, Volume 19, No. 1, p. 73-89, AAAI 1998 describes implementing various weakly applied language modelling techniques to define a lexical context for a commercial hand-printed character recognition system. This scheme allows the definition and combination of “word lists, prefix and suffix lists, and punctuation models”, including some that are “derived from a regular expression grammar”. The dictionaries and lexical templates can be searched in parallel, and include a prior probability for each expression. The syntactic templates are hand-coded and probabilities are derived from empirical analysis.
R. Srihari, “Use of Lexical and Syntactic Techniques in Recognizing Handwritten Text”, ARPA Workshop on Human Language Technology, Princeton, N.J., March 1994 describes using a combination of lexical and syntactic techniques to disambiguate the results of a handwriting recognition system. Specifically, the technique applies word collocation probabilities to promote or propose words based on context, and uses a Markov model of word syntax based on part-of-speech tagging.
U.S. Pat. No. 6,137,908, describes using a trigram language model in combination with other heuristics to improve the accuracy of character segmentation and recognition.
In U.S. Pat. No. 6,111,985, a character grammar during recognition, and a traditional maximum likelihood sequence estimation algorithm (i.e. Viterbi decoding) are used to disambiguate words from numeric strings using an N-gram character model.
Similarly, the handwritten word recognition system described in U.S. Pat. No. 5,392,363, uses character- and word-grammar models for disambiguation in a frame-based probabilistic classifier.
U.S. Pat. No. 5,787,197, uses a dictionary-based post-processing technique for online handwriting recognition. The dictionary search strips all punctuation from the input word, which is then matched against a dictionary. If the search fails, “a stroke match function and spell-aid dictionary is used to construct a list of possible words”.
Similarly, U.S. Pat. No. 5,151,950 describes using a tree-structured dictionary as a deterministic finite automaton to merge classifier results with contextual information. This system selects “from the example strings the best-matching recognition string through Hidden Markov processing”.
U.S. Pat. No. 5,680,511, uses a word-based language model “to recognize an unrecognized or ambiguous word that occurs within a passage of words.” The method is described in the context of spoken or handwritten text recognition.
U.S. Pat. No. 5,377,281, employs a knowledge-based approach to post-processing character recognition strings. The knowledge source used includes word-probabilities, word di-gram probabilities, statistics that relate the likelihood of words with particular character prefixes, and rewrite suggestions and their costs, and are derived from a text corpus.
U.S. Pat. No. 5,987,170, uses a combination of word and grammatical dictionaries for the recognition of oriental script. U.S. Pat. No. 6,005,973, derives both dictionary strings and a most-likely digit string during recognition, which are presented to the writer for selection.
U.S. Pat. No. 6,084,985 describes a method for on-line handwriting recognition based on a hidden Markov model and uses real-time sensing of at least an instantaneous write position of the handwriting, deriving from the handwriting a time-conforming string of segments each associated to a handwriting feature vector. The method then matches time-conforming string to various example strings from a data base pertaining to the handwriting, and selecting from the example strings a best-matching recognition string through hidden-Markov processing.
Accordingly, it can be seen that each of the above methods suffer from a variety of disadvantages. In particular, the majority of the techniques tend to require large amounts of data processing. This can limit the circumstances in which the techniques can be implemented, in particular because powerful processors are required to perform, the recognition.