In pattern recognition machines, such as that described in the patent application, Ser. No. 07/770,267, filed Oct. 3, 1991 and assigned to Applicants assignee, it is usual to store large amount of prototype patterns and compare them to a given example or unknown input symbol for identification. In several pattern comparison processes using common algorithms, such as K-nearest Neighbor (KNN), Parzen windows, and radial basis functions (RBF), the comparison is done by means of generating a distance measure, for example the euclidian distance, between patterns and prototypes. The KNN and Parzen windows algorithms are described in the book "Pattern Classification and Scene Analysis" by Duda and Hart, 1973; and the RBF algorithm is described in the article, "Multivariate Functional Interpolation and Adaptive Networks" appearing in "Complex Systems," by Broomhead, D. S. and Lowe, D., which are hereby incorporated by reference.
The several prototypes which have the smallest such "distances" from the example each cast a vote. The voting determines the class of the example. Variations on the voting scheme and on how the distance measure is used account for the difference between the KNN, Parzen windows RBF and other distance-based classification algorithms.
These classification methods find important applications in optical character recognition.
One key to successfully utilizing these algorithms in pattern recognition is the choice of process for comparing prototypes to example, because the process chosen determines the accuracy of the classification.
It is desirable for the comparison scheme to be "invariant" to small transformations of both the prototypes and the examples. The term "invariance" as used herein refers to the invariance of the nature of a pattern in the perception of a human observer, with respect to some transformation of that pattern. The term is further is described in the above-noted patent application Ser. No. 07/770,267, which is hereby incorporated by reference.
In the case of alphanumeric patterns, the possible transformations of the image include: translation, rotation, scaling, hyperbolic deformations, line thickness changes, grey-level changes, and others. An example of a pattern's invariance to small transformations is given by considering the nature of the image of a "3" pattern: it is invariant by translation, which is a linear displacement of the image. That is, translating the image does not change the meaning of the image to a human observer. On the other hand, the nature of the image of a "6" is not invariant by a rotation of 180 degrees: to a human observer it becomes a "9". To the same observer, however, a small rotation of the upright "6" image does not change the meaning of the image.
A desirable property of a pattern recognition machine is that its output be invariant with respect to some specific transformation of its input. In many prior art processes by which the processing machine classifies by comparing examples or unknown patterns to prototypes, features of the example first are extracted, and the comparison is made on the features rather than on the raw data. The desired property of these extracted features is that they are more invariant to some transformation of the image than the unprocessed image. In alphanumeric classification, for instance, bars and loops are high level features which are less affected by low level transformations such as a translation of the image.
A particular example of such a system is a machine for making accurate classifications as to the identity of letters and numbers in the address block of envelopes being processed at a postal service sorting center. Such a machine may utilize the information processing of the KNN, Parzen window or RBF algorithm. In these machines it is necessary that the classifier recognize accurately the many shapes and sizes in which each letter or number are formed in script placed on the envelopes by postal service users.
Given an unlimited amount of prototypes and recognition time, this type of system could exhibit or extract the relevant invariances from the raw data alone; but this is often not feasible because required storage capacity and recognition time grow linearly with the number of prototypes.
Two common strategies, therefore, are invoked in the prior art to deal with these limitations. The first is to carefully select the set of prototypes. This strategy often is ineffectual, however, because an insufficient number of prototype gives very poor recognition accuracy and performance. Large amounts of prototypes, if available at all, entail having to make many comparisons in the recognition phase, a process which drastically slows the recognition time.
The second strategy commonly used in the prior art is to compare a set of features extracted from the patterns rather than from the raw data. Although this strategy increases the comparison time, it has the advantage of using the more relevant information about the prototypes. Thus, if the system can operate with fewer prototypes, a gain in speed can be obtained. For instance, if the extracted features are invariant with respect to some transformation, all the prototypes which differ from each other by that transformation are redundant and can be removed from the set of prototypes used during recognition.
Unfortunately, the choice of features is very sensitive and is subject to complex heuristics. In many cases any resultant gain in speed is at the expense of performance because the features are not truly invariant.
The factors of recognition time and correctness therefore are not yet satisfactorily addressed and remain an issue in recognition engines which utilize the information processing steps of the KNN algorithm and other distance based systems to recognize handwritten script in the address box of envelopes.