In the field of pattern recognition, it is generally desirable to employ an adaptive type of statistical classifier, rather than a programmed classifier, in those situations where input samples which belong to the same class of patterns can have an unknown variance between them. One example of such a situation is the field of handwriting recognition. While alphanumeric characters have well defined shapes, the manner in which individuals write those characters can vary widely among different persons. Even the same person can write characters differently at various times. A classifier that is programmed to recognize particular patterns, such as the letters of an alphabet, may not be able to accommodate the nuances introduced by the handwriting of different individuals. Conversely, a classifier which has trainable properties, for example one which employs a neural network, can be taught to recognize that different variations of the same character belong in the same class. For this reason, adaptive statistical classifiers are used for applications such as speech recognition, handwriting recognition and optical character recognition.
In general, an adaptive statistical classifier, such as a neural network, produces a number of output values which respectively correspond to each of the possible output classes for which the classifier has been trained. For example, in the field of character recognition and handwriting recognition, each character of an alphabet might comprise an output class. Thus, for a classifier which is trained on the English alphabet, there might be 52 output nodes, which respectively correspond to each of the upper and lower case letters of the alphabet. For a given input sample, the network produces 52 output values, which respectively indicate the probability that the input sample belongs to each of the 52 classes. These probabilities are then processed in other modules of a recognition system, for example with reference to a dictionary, to provide an estimate of the class to which the input pattern belongs.
In the training of a classifier, a number of training samples are individually provided as input data to the classifier, and a target output vector is designated for each input sample. Thus, if a training sample comprises a pattern which is labelled as the letter "a", the output class corresponding to this letter is given a target value of "1", and all other output classes are given a target value of "0". For each training sample that is provided to the classifier, an output vector is produced. This output vector is compared with the target output, and the differences between the two represent an error. This error is then employed to train the classifier. For example, in a neural network, the value of the error can be back-propagated through each of the layers of the network, and used to adjust weights assigned to paths through which the data propagates in the operation of the classifier. By repeating this process a number of times, the weights are adjusted in a manner which causes the output vector of the classifier to converge toward a target value for a given input sample.
As the number of output classes increases, the number of possible output nodes also increases, with the result that a larger number of input samples are required to properly train the classifier. For example, the alphabets for some European languages contain more than the 26 characters of the English alphabet. Many of these additional characters comprise a combination of a base Roman letter and a diacritical mark, such as acute, grave, dieresis, tilde, circumflex, ring, hacek, breve, and the like. In a conventional classifier, a separate output class is provided for each of these additional characters. If a single classifier is designed to be multilingual, and thereby recognize characters in each of a number of languages, such as the various European languages for example, a significant number of additional output classes is required, resulting in a concomitant increase in the number of samples, and hence time, needed to train the classifier.
Not all of the compound characters are utilized in all of the languages. In an effort to reduce the number of output classes, therefore, it is possible to design classifiers for a specific language. However, such a classifier has severely limited applicability, since it cannot recognize characters that are employed in languages other than the one for which it was designed.
It is desirable, therefore, to provide a statistical classifier that is capable of recognizing characters that are employed in a variety of different languages, while minimizing the amount of effort that is required to train such a classifier.