1. Field of the Invention
The present invention relates generally to pattern recognition systems such as those used to recognize hand printed and/or machine printed letters and digits (e.g., alphanumeric characters appearing on fill-in-the-blank forms), face or fingerprint identification systems, sonar systems, etc.
More particularly, the invention relates to methods and apparatus for constructing a classification weights matrix for a pattern recognition system which enables a relatively large given feature set for the system (e.g., a 1,500 element set); to be reduced (for example, to a 300 element set) and yield at least the same level of performance as achieved by the system when using the given feature set.
According to a further aspect of the invention, methods and apparatus are described for determining (evaluating) the classification efficiency of selected subsets of a set of features in a given pattern recognition system.
Still further, the invention is directed to methods and apparatus for constructing the aforementioned classification weights matrix utilizing a genetic search process to find a subset having maximum classification efficiency.
Further yet, the invention is directed to pattern recognition systems (including, in particular, character identification systems), which utilize classifiers constructed in accordance with the aforementioned aspects of the invention to actually perform pattern recognition.
2. Description of the Related Art
As indicated hereinabove, pattern recognition systems may be used for a variety of purposes and may take many different forms. Without intending to limit the scope or spirit of the present invention, but rather for the sake of illustrating the principals thereof, the focus of the description that follows will be on optical character recognition ("OCR") systems used to recognize hand printed and/or machine printed letters and digits. The same principals will be recognized by those skilled in the art as equally applicable to other types of pattern recognition systems.
Conventional methods of character pattern recognition, whether of machine printed characters or hand printed characters, fall into many classes including neural network based recognizers and statistical classifiers as well as template matching and stroke based methods.
Neural network based systems are characterized by plural nonlinear transfer functions which vary in accordance with some learning method, such as back propagation. The neural networks typically evolve discrimination criteria through error feedback and self organization. Because plural transfer functions are used in the educated recognition system, neural networks are not very well suited for implementation on general purpose computers and generally need dedicated special purpose processors or dedicated node hardware in which each of the transfer functions is implemented.
On the other hand, statistical based classifiers are more suited for implementation on general purpose computers. Statistical classifiers can be implemented using a number of different statistical algorithms. These algorithms generally deal with selected features of the characters and analytically determine whether the features belong to or are members of clusters of features which clusters define characteristics of the characters being recognized. In other words, if the features of an unlabeled character fall within the boundaries of a cluster of features which characterize a particular text character, then the probability is high that the character to be labeled corresponds to the character of the cluster.
One approach, which is pixel-based, to identifying whether an unlabeled character falls within a cluster boundary is to compute the Hamming distance between an unlabeled character pixel array and the arrays of possible matching text characters. Another approach, which is feature-based, is to use a polynomial least mean square classifier with a quadratic discriminant function, such as described in Uma Shrinivasan, "Polynomial Discriminant Method For Hand Written Digit Recognition", State University of New York at Buffalo, Technical Report, Dec. 14, 1989, incorporated by reference herein.
The Shrinivasan classifier works as follows. A database of labeled, hand written alphanumeric characters (digits, upper case alphabetics, or the combination of the two) are converted to feature vectors, v, and are associated with target vectors. The components of the feature vectors are F quadratic polynomials (features) formed from the character's pixel array to provide evidences of lines through the image. The target vector for each character is a standard unit vector e.sub.k(v) with the k(v).sup.th component equal to 1 and all other components equal to zero, where k(v) is the externally provided classification for the character, for example 0,1,2, . . . ,9 or A,B, . . . ,Z or a combination. Standard numerical techniques are used to determine an F.times.K floating point weight matrix A to minimize the squared error sum, .epsilon..sub.v (Av-e.sub.k(v)).sup.2, where the sum runs over all feature vectors in a training set, and K is the number of classes, for example, K=10 for digits or K=26 alphabetics.
The weights matrix, A, is then used to classify unlabeled characters by determining the largest component in the product Aw, where w is the unknown character's feature vector. Additional details of this method can be found in the above-identified paper which includes source code implementing the method.
The above described system along with other statistically based systems, such as described in U.S. Pat. No. 5,060,279, are one shot learning systems, that is, the weight matrix or equivalent database is created in a single pass over the set of labeled characters used to produce the matrix or database. Such statistically based classifiers provide a reasonably good classification system but generally do not have the accuracy of neural network systems. However, the more accurate neural network based systems are slower to learn, slower to identify characters and require more memory and computing hardware than the statistical classifiers.
A system which combines the advantageous accuracy of the neural network based systems with the speed and efficiency of the statistically based systems and which may be based on simple integer or bit arithmetic, is described in copending U.S. patent application Ser. No. 07/931,741, filed Aug. 18, 1992, assigned to the same assignee as the present invention. The aforementioned copending patent, entitled "Supervised Training Augmented Polynomial Method And Apparatus For Character Recognition", invented by Peter G. Anderson, is hereby incorporated by reference.
The incorporated copending patent application describes a system that creates a classification matrix, which classifies or identifies hand printed or machine printed alphanumeric characters, using an iterated least squares polynomial discriminant method.
During iteration the classification weight matrix, to be subsequently used for identification, is modified by determining which characters are incorrectly classified, or classified with too small a confidence, and replicating those characters during training to strengthen the correct classification. The correct classification is also strengthened by using negative feedback, or subtracting out of the misclassified target vectors, to inhibit an incorrect classification.
The speed of the learning process is enhanced by subsampling the training data during feature vector extraction, supersampling (that is, artificially enlarging) the training set and stepwise increasing the amount of the training set used, maintaining intermediate matrices and step wise increasing the amount of each feature vector used during training.
Classification accuracy is enhanced by using features of at least two types, both based on quadratic monomials of the pixels called King and Knight features (so called because they resemble the squares in a chess board the respective piece moves to and from). The memory efficiency utilization is enhanced by modifying the existing weight matrix and compressing the sparse binary features vectors.
Although describing an alternative to the one shot learning systems referred to hereinabove to improve classification accuracy (thru training) and suggesting the maintenance and use of intermediate matrices to develop improved classifiers, etc.; the classifier development technique taught in the aforementioned incorporated copending patent application uses a (binary) vector of 1,500 features, based on an equidistributed collection of products of pixel pairs, to form the linear discriminator used for character recognition.
As a result of using such large vectors, the processes taught in the aforementioned incorporated copending patent application (and the incorporated reference as well), require time consuming and computer resource consuming matrix manipulation steps (e.g., computing the inverse of a 1,500.times.1,500 matrix), each time a new classifier is built and evaluated.
Furthermore, although the classifier taught in the incorporated copending patent application is qualitatively competitive with, and is faster to train and to run than many classification alternatives known in the prior art; the 1,500-member feature set clearly contains many redundant (overlapping or useless) members.
Accordingly, it would be desirable to provide methods and apparatus for constructing a classification weights matrix for a pattern recognition system based on a significantly smaller set of features than is presently required by competitive prior art pattern recognition techniques as exemplified by the techniques taught in the incorporated references.
A significantly smaller feature set (for example, 300 versus 1,500 features), would be very desirable for faster training purposes, to allow faster and smaller application programs to be developed, and to facilitate hardware implementation of such smaller systems if desired. Furthermore, systems using a "small" set of features would also be less likely to allow a system to overfit the training data by memorizing noise in the training data, compared with systems that require a "large" feature set and for this reason are desirable as well.
Additionally, it would be desirable to provide methods and apparatus for not only creating, but for also judiciously selecting and evaluating reduced feature sets used to construct small classifiers, i.e., methods and apparatus that identify the subsets that work particularly well in building classifiers using only the small set of extracted features from a given feature set.
In particular, it would be desirable to provide methods and apparatus for constructing classification weights matrices for pattern recognition systems that, utilizing a reduced feature set, achieve correct classification rates comparable to (or better than) those rates attainable using the aforementioned prior art techniques.
Furthermore, it would be desirable to be able to determine which subset of a given feature set to use in order to construct one of the aforementioned desirable classification weights matrices, where the determination of which subset to use (i.e., which reduced feature set) is based on the objective criteria used in searching the space of possible feature subsets.
Further yet, it would be desirable to provide methods and apparatus for allowing the space of f-element subsets of a set of F features, where f&lt;F, to be efficiently searched (deterministically or heuristically), to find a f-element subset having a maximum "classification efficiency", where classification efficiency is defined as the percent of correct classifications made on a predetermined set of exemplars. The "maximum" discovered could be the maximum in fact, or the maximum as determined after searching for a predetermined period of time, the maximum exceeding a predetermined threshold, etc.
It would also be desirable to be able to provide methods and apparatus for constructing classification weights matrices which are defined in terms of other matrices: (a) which may be readily constructed utilizing prior art techniques, such as those described in the incorporated references; and (b) which are relatively easy to manipulate by virtue of their being constructed utilizing the aforementioned, reduced size, feature subset having a maximum classification efficiency.
Further still, it would be desirable to provide pattern recognition systems (including, in particular, character identification systems), which utilize classifiers constructed in accordance with the aforementioned aspects of the invention to actually perform pattern (e.g., character) recognition.