Computer recognition of handwritten characters has been attempted for many years. A reliable system for recognizing English language handwritten characters as they are written is not currently available.
The recognition of English-language handwritten characters as they are written, sometimes called "on-line recognition," presents considerable difficulties. The characters must be recognized at approximately the same rate they are written. Average writing rates are about 1.5-2.5 characters per second for printed English characters, and 5-10 characters or more per second for cursive writing. A reliable on-line recognition system must therefore be able to recognize up to 8-10 characters per second to keep up with a typical writer.
One of the major problems in recognizing handwritten text is that the handwriting includes features which identify a character and features which are peculiar to each person. Features which uniquely identify the characters must be analyzed for recognition; features that are peculiar to the individual's handwriting may be discarded. Sorting the character identification data from the particular writer's style is a problem yet to be efficiently solved by the prior art. Compounding this problem is the large amount of data available in a handwriting recognition system. A typical electronic tablet has a resolution of 200 points per inch, with a sampling rate ranging from sixty to several hundred points per second, thus providing large amounts of data. The data available for analysis can be increased by extracting information and calculating selected parameters from the input data. For example, the velocity of the writing instrument, the slope of the characters, the baseline of each word, the mid-zone of each word, the segmentation of the words and other features may be calculated, as described in U.S. Pat. No. 4,024,500, to Herbst et al., incorporated herein by reference. One problem in the prior art is identifying and saving the features that identify the character while discarding the features unique to each person's handwriting. If all of the collected data is used, the quantity is simply so great that the system is overwhelmed and on-line recognition cannot occur.
Various approaches have been attempted to extract features that identify the character and discard useless features. For example, some recognition systems require that the characters be boxed, discrete characters. Other systems permit spaced, printed characters. One system, described in U.S. Pat. No. 4,731,857, to Tappert, incorporated herein by reference, describes a system to aid in recognizing run-on, discretely written characters. Some systems attempt to recognize Chinese characters and rely on the inherent distinctiveness of the stroke order, see U.S. Pat. No. 4,365,235, to Greanias et al., incorporated herein by reference. Dynamic programming has been used to aid in improving the speed and likelihood of obtaining the correct match for character recognition. See, for example, U.S. Pat. No. 3,979,722, to Sakoe et al., incorporated herein by reference.
Some of the most difficult problems in on-line handwritten character recognition are presented by pure cursive scriptwriting, or the combination of mixed cursive, discrete, and run-on discrete characters. With these writing types, the character features useful for recognition vary from writer to writer and separation of character features from handwriting style features is difficult. As early as 1964, attempts were made to extract data from handwritten cursive text for use in recognizing the characters, as seen, for example, in the system described in U.S. Pat. No. 3,133,266. A recent article, which itself is not prior art to the present invention but cites articles which are prior art, describes various approaches in performing on-line handwritten character recognition, see "The State of the Art in On-Line Handwriting Recognition," by Charles Tappert et al., IEEE Transactions on Pattern Analysis and Machine Intelligence 12(8):787-808, August 1990.
A method and system for organizing the recorded data, retaining the important data and discarding the nonessential data would aid current recognition technology to output a correct match. Despite the extensive research and experimentation in the field of character recognition for cursive handwriting, a reliable system for extracting data to identify the character and discarding data peculiar to an individual does not exist today.