1. Technical Field
The present application relates generally to handwritten text recognition and, more particularly, to a handwriting text recognition system and method whereby character sequences are modelled as single characters (xe2x80x9ccompound character modelsxe2x80x9d) in order to improve recognition accuracy of converting handwritten text to machine printed text.
2. Description of the Related Art
Currently, the need for accurate machine recognition of handwritten text has increased due to the popularity and wide spread use of handheld, pen-based computers. However, the ability to achieve high recognition accuracy with conventional machine recognition devices has proven to be a difficult task due to the wide variety of individual handwriting styles, many of which have ambiguous and/or conflicting character representations. This difficulty is further compounded by the fact that, even for a particular writer, the manner in which a given letter is written can vary dramatically depending on the location of the letter in the word.
In particular, letters at the end of word are frequently written less carefully than letters at the beginning of the word due to the tendency of writers to xe2x80x9cslurxe2x80x9d together ending characters of a written word. For instance, due to xe2x80x9cslurredxe2x80x9d handwriting, character sequences such as xe2x80x9cingxe2x80x9d, xe2x80x9cousxe2x80x9d and xe2x80x9cionxe2x80x9d, which commonly appear at the ends of words, typically bear little resemblance to the same letters (or combination of letters) that appear in other locations of the same word. And yet, these xe2x80x9cslurredxe2x80x9d character sequences contain enough information for a human reader to recognize them correctly.
Conventional methods for machine recognition of handwritten text typically recognize a word by recognizing constituent characters of the word using statistical models (i.e, character models) that are previously generated for characters comprising a given vocabulary. Conventional. handwriting recognition systems are not trained to recognize slurred handwritten character sequences. Consequently, decreased recognition accuracy is realized when decoding slurred character sequences.
The present application is directed to a handwriting recognition system and method whereby various character sequences are each modelled as a single character (xe2x80x9ccompound character modelxe2x80x9d) so as to provide improved recognition accuracy when decoding xe2x80x9cslurredxe2x80x9d character sequences.
In one aspect of the present invention, a method for generating a handwriting recognition system having compound character models, comprises the steps of:
providing an initial handwriting recognition system having individual character models;
collecting and Labelling a set of handwriting data;
aligning the labelled set of handwriting data;
generating compound character data using the aligned handwriting data; and
retraining the initial recognition system with the compound character data to generate a new recognition system having compound character models.
In another aspect of the present invention, a system for recognizing handwritten text, comprises:
means for inputting handwritten text;
means for storing a plurality of character models, the character models including individual character models and compound character models;
means for decoding the input handwritten text using the individual character models and the compound character models such that when the decoding means detects a compound character, the compound character is expanded into its corresponding constituent individual characters; and
means for outputting the decoding results.