Asian languages, like Chinese and Japanese, have several thousand characters. In China, where the character set used is Simplified Chinese, the Guo Biao (GB) coding standard specifies nearly 6800 characters. The GB coding standard specifies two byte codes for all Simplified Chinese characters supported in today's computer systems. In Taiwan and Hong Kong, where the character set used is Traditional Chinese, the Big 5 coding standard specifies nearly 13000 characters. The Big 5 coding standard specifies two byte codes for all Traditional Chinese characters supported in today's computer systems. The Japan Industrial Standard 208 (JIS208) coding standard specifies nearly 6300 Japanese Kanji characters. When the number of characters is so large, keyboard entry of characters becomes awkward. In such cases, an input method that is based on handwriting recognition is an attractive alternative.
Handwriting recognition systems can be implemented by comparing the handwriting input with a collection of models for all the characters of interest in the vocabulary, computing a distance measurement between the input and each one of the stored models and selecting one or more candidates from the models that have the least distance to the handwritten input. There are several research results in handwriting recognition that can be applied to recognition of Chinese and Japanese Kanji characters. In a first method, a hierarchical representation is used to model characters. In this approach, characters are represented as a collection of smaller units called radicals, written in a specific order. The radicals themselves are modeled as a sequence of strokes. Recognition typically involves segmenting the handwritten input into component radicals and then comparing those with stored models for radicals. While this method efficiently stores character models, the process of segmentation before recognition limits the accuracy of the recognizer.
In a second method, models are stored for entire characters. Since this method does not rely on segmentation of the input into radicals before recognition, it generally gives a more accurate recognizer. The amount of memory required to store the character models and the computational requirements, however, are relatively large and can be a severe limitation in some applications.
In addition, handwriting recognition systems for Chinese and Japanese Kanji characters should address variations in the order of strokes and variations in the number of strokes that are used to write the same character. One method that is commonly used to deal with variations in stroke order is to re-order the strokes based on some heuristics. By doing so, the actual time order of strokes that were written is discarded. The drawback with such an approach is that the problem of re-ordering strokes is a difficult one and is often the source of errors in the recognition system.
Another method is to maintain a single model for each character and to generate several stroke order variations during run time of the recognition system. This approach, however, would require additional computing resources to generate stroke order variations from a base character model.
Other methods include maintaining a single model for each character and generating new models at run time by connecting one or more strokes, or alternatively, using more powerful matching algorithms based on dynamic programming to compare the handwritten input with stored character models, even if they differ in the number of strokes. Both these techniques, however, require additional computational power during run time of the recognition system.
There is a proliferation of small hand-held devices in today's market place. These devices perform functions ranging from maintaining address and contact information to more advanced functions like paging and cellular telephony. In many of these hand-held devices, the processing power available is very small compared to a traditional personal computer. Thus, the algorithms used for handwriting recognition on a personal computer would not be of much use on most of these hand-held devices.
Thus, a need exists for a handwriting recognizer that has very high processing speed, modest memory requirements and high accuracy when characters are written stroke-by-stroke or with limited stroke connectivity, as well as addressing variations in the order of strokes and variations in the number of strokes that are used to write the same character.