Computers accept human user input in various ways. One of the most common input devices is the keyboard. Additional types of input mechanisms include mice and other pointing devices. Although useful for many purposes, keyboards and mice (as well as other pointing devices) sometimes lack flexibility. For example, many persons find it easier to write, take notes, etc. with a pen and paper instead of a keyboard. Mice and other types of pointing devices do not generally provide a true substitute for pen and paper. Traditional input device limitations are even more acute with regard to East Asian languages. As used herein, “East Asian” includes, but is not limited to, written languages such Japanese, Chinese and Korean. Written forms of these languages contain thousands of characters, and specialized keyboards for these languages can be cumbersome and require specialized training to properly use.
Electronic tablets or other types of electronic writing devices offer an attractive alternative to keyboards and mice. These devices typically include a stylus with which a user can write upon a display screen in a manner similar to using a pen and paper. A digitizer nested within the display converts movement of the stylus across the display into an “electronic ink” representation of the user's writing. The electronic ink is stored as coordinate values for a collection of points along the line(s) drawn by the user. Software may then be used to analyze the electronic ink to recognize characters, and then convert the electronic ink to Unicode, ASCII or other code values for what the user has written.
There are many handwriting recognition systems in use employing various algorithms to map handwritten data to characters. One such system is described in commonly-owned U.S. Pat. No. 5,729,629 ('629 patent), titled “Handwritten Symbol Recognizer,” which patent is incorporated by reference herein. The described recognizer is useful for, e.g., recognition of East Asian language characters. The recognizer implements template matching for characters written in multiple strokes so as to map the features for all strokes of an input character to a Unicode or other value for the ink character. Each input stroke of a character is described by a five-dimensional feature vector representing the x and y coordinates of the stroke start and end points, together with a feature code corresponding to the overall shape of the stroke (e.g., vertical line, horizontal line, counterclockwise arc, etc.). The recognizer measures a Euclidian Vector Distance between each input stroke and a stroke of a stored reference character (or “prototype”). The database of prototypes is divided into multiple groupings (or “spaces”) based on the number of strokes in the prototype. For example, a 5-space contains prototypes having five strokes.
Another recognizer, which is similar in many respects to the recognizer described in the '629 patent, is described in commonly-owned U.S. Pat. No. 6,094,506 ('506 patent), titled “Automatic Generation of Probability Tables for Handwriting Recognition Systems,” which patent is also incorporated by reference herein. In that recognizer, each stroke of an input character is also described by a five-dimensional vector representing a feature code for the stroke and the x and y coordinates of stroke start and end points. The input character is then compared against every prototype in a database having the same number of strokes as the input character. To perform this comparison, a Shape Feature Probability Matrix (SFPM) is created in which each possible shape feature corresponds to a row and to a column. Each entry in the SFPM represents a probability that, for any two characters having s strokes and having shape features fi and fj at position p (where fi is the feature code for the input stroke, fj is the feature code for the prototype stroke and p=1, 2, . . . s), the characters are the same. A Position Feature Probability Table is also generated. The PFPT is a one-dimensional array containing one entry for each possible feature distance, and which is indexed by feature distance. The feature distance is calculated as (xjp1−xip1)2+(xjp2−xip2)2+(yjp1−yip1)2+(yjp2−yip2)2, where (xjp1,yjp1) and (xjp2,yjp2) are the start and end points for stroke p of the model, and where (xip1,yip1) and (xip2,yip2) are the start and end points for stroke p of the input ink. Each entry in the PFPT represents a probability that, for any two characters having s strokes and a feature distance D between strokes at the same position p, the characters are the same. During recognition, each input character is compared to each prototype by comparing the strokes of the input character and of the prototype; the first stroke of the input character is compared to the first stroke of the prototype character, the second stroke of the input character is compared to the second stroke of the prototype character, etc. Using the SFPM, a first number is computed by summing values obtained by indexing the first input and prototype strokes, by indexing the second input and prototype strokes, etc. Using the PFPT, a second number is computed by summing values indexed by the feature distances between the first input and prototype strokes, between the second input and prototype strokes, etc. A Match Probability value equals the sum of these first and second numbers. The prototype for which a comparison against the input character results in the highest probability of a match is considered the best match. As described in the '506 patent, the SFPM and PFPT values are based on a negative logarithmic function of the probability. Thus, the lowest Match Probability value corresponds to the highest probability of match.
For characters written in print form, few problems are presented when using one of the above-described recognizers when an input ink character having s strokes is compared to prototypes having the same number of strokes, i.e., in the s space. As used herein, “print” refers to a writing style in which a user attempts to create a character so as to mimic a standardized format, and is distinguished from machine-printed characters (e.g., typed, computer generated font, etc.). Although there are variations in the relative position and shape of strokes for a given handwritten printed character, different users generally print the character using the same number of strokes.
Challenges arise in connection with recognizing cursive handwriting. Often, a cursive representation of a particular character will connect two or more strokes into a single stroke. Strokes may also be skipped and/or rounded in cursive handwriting. In theory, a character written in s strokes in print form can be written in 1 to s strokes in cursive form. This is illustrated in FIG. 1, which shows the Simplified Chinese character having Unicode code point U+9752 (phonetically “qing,” meaning “green”). The character is shown in standard form on the left side of the figure, and in progressively more cursive variations toward the right side of the figure. Although there are various techniques that may increase accuracy of cursive character recognition, these techniques typically require significantly more processing time than is required for various techniques that work well for print recognition.
In many cases, users create ink with a combination of cursive and printing (or cursive that is very similar to printing) styles of handwriting. If a recognizer could distinguish between these two styles and only apply the more time-consuming techniques where needed, substantial performance improvements could result. Accordingly, there remains a need for improved handwriting recognizers able to distinguish between cursive and print (or near-print) styles and to then apply different recognition techniques to the different styles.