One of the biggest problems in handwriting recognition technology is reducing the error rate. One frequent type of error results when a user electronically enters a handwritten character, known as a chirograph, that closely matches two or more possible characters in a set to which the computer is trying to match the chirograph, i.e., a set of possible code points. Characters which cause the most errors are typically those which are identical to one another except for a single difference that humans can discern, but contemporary recognizers cannot. For example, certain Japanese symbols are substantially identical to one another but for a single, subtle difference.
The problem arises in that most handwriting recognition systems use recognizers based on Neural Nets, Hidden Markov Models (HMM) or a K-Nearest-Neighbor (KNN) approach. These systems perform reasonably well at the task of classifying characters based on their total appearance, but where two characters are identical except for a single difference, they often fail. While attempts have been made to manually code recognizers to discern between particularly troublesome pairs, there are many sets of characters which are easily confused for one another. This makes the coding process very labor intensive and tedious. Moreover, the result of the coding depends on one or more person's best guesses as to what to test for to distinguish the characters. This is not necessarily very optimal, as there are many possibilities for what best differentiates two (or more) close characters. Indeed, the best of such systems do not substantially reduce the error rate. Lastly, each time the recognizer is changed, the set of characters which are confused by the recognizer also changes, requiring that much of the labor-intensive coding process be repeated.
Another type of recognition system, based on Decision trees, especially Classification and Regression Trees (CART), has been attempted for handwriting recognition. These types of systems have been rejected because they are unable to make reliable decisions from among large numbers of characters. By way of example, for systems using Japanese character sets, 6650 different characters are supported. As can be appreciated, developing a binary tree that can receive any one of 6550 characters and test that character repeatedly and properly down appropriate branches until a single correct result is found would be an extremely difficult and massive undertaking.