Pattern recognition systems are employed in various areas of technology to help classify and/or match a test pattern relative to one or more known prototype patterns. Examples of pattern recognition applications include image analysis and classification, handwriting recognition, speech recognition, man and machine diagnostics, industrial inspection, medical imaging, etc.
In a pattern recognition system, it is common to store large amounts of data indicative of prototype patterns and compare them to a given example or unknown input symbol for identification. Several common algorithms may be utilized, such as K-nearest Neighbor (KNN), Parzen windows, and radial basis function (RBF), to compare or measure similarities between patterns. A level of similarity may be determined by generating a distance measure. By way of example, a simple algorithm for comparing two patterns f and g is to compute the Euclidean distance between them, such as may be expressed as:             d      E        ⁡          (              f        ,        g            )        =                              ∑                      x            ,            y                                                          ⁢                                  ⁢                              (                                          f                ⁡                                  (                                      x                    ,                    y                                    )                                            -                              g                ⁡                                  (                                      x                    ,                    y                                    )                                                      )                    2                      =                            (                      f            -            g                    )                2            where dE denotes the Euclidean distance between two patterns, and f and g are assumed to be two-dimensional patterns, indexed by x and y. An extension of the Euclidean distance methodology to other dimensions is straightforward.
The usefulness of the Euclidean distance algorithm is limited, however, because if f and g are not perfectly aligned, the Euclidean distance can yield arbitrarily large values. Consider, for instance, a case where g is a translated version of f, that is g(x,y)=f(x+1, y). In this case, the Euclidean distance could yield a very large value, even though f and g may be virtually identical except for a small amount of translation in the x-direction.
One proposed comparison scheme to remedy the aforementioned shortcoming associated with the traditional Euclidean distance approach is to employ a tangent distance, such as is disclosed in U.S. Pat. No. 5,422,961. This comparison scheme is invariant with respect to a selected set of small transformations of the prototypes. The small transformations of interest are expressed by calculating a derivative of the transformed image with respect to the parameter that controls the transformation. The directional derivative is used to generate a computational representation of the transformation of interest. The transformation of interest, which corresponds to a desired invariance, can be efficiently expressed by using tangent vectors constructed relative to the surface of transformation. The tangent distance dT may be expressed as:                     d        T            ⁡              (                  f          ,          g                )              2    =            min                        α          f                ⁢                  α          g                      ⁢                  (                  f          +                                    L              f                        ⁢                          a              f                                -          g          -                                    L              g                        ⁢                          a              g                                      )            2      where Lf and Lg are matrices of tangent vectors for f and g respectively, and af and ag are vectors representing the amount of deformation along the tangent plane. An advantage of tangent distance compared to the traditional Euclidean distance approach is that the tangent distance is less affected by translation than the Euclidean distance because if Lf and Lg contain a linear approximation of the translation transformation, the tangent distance compares the translated version of f and g. The tangent distance concept is explored in greater detail in a paper entitled, “Efficient Pattern Recognition Using a New Transformation Distance,” presented by Patrice Y. Simard, Yann LeCun and John Denker, Advances in Neural Information Processing Systems, Eds. Morgan Kaufmann, pp. 50–58, 1993.
A limitation of tangent distance approach, however, is that the transformations to which it is invariant generally must be known a-priori and precisely (e.g., translation, rotation, scaling, etc.). Moreover, tangent distance has no mechanism to specify loose constraints such as small elastic displacements. Such mechanism would be useful because in many cases, such as with speech or image patterns, it is not known which transformations should be used, but it is assumed that the problem exhibit some invariance with respect to small elastic displacements.
A comparison scheme that is invariant to such transformations can operate with greater economy and speed than comparison schemes that require exhaustive sets of prototypes. By way of example, transformations of alphanumeric patterns that are of interest in this regard may include translation, rotation, scaling, hyperbolic deformations, line thickness changes, and gray-level changes. Any desired number of possible invariances can be included in any particular recognition process, provided that such invariances are known a priori, which is not always possible.
Many computer vision and image processing tasks benefit from invariances to spatial deformations in the image. Examples include handwritten character recognition, face recognition and motion estimation in video sequences. When the input images are subjected to possibly large transformations from a known finite set of transformations (e.g., translations in images), it is possible to model the transformations using a discrete latent variable and perform transformation-invariant clustering and dimensionality reduction using Expectation Maximization as in “Topographic transformation as a discrete latent variable” by Jojic and Frey presented at Neural Information Processing Systems (NIPS) 1999. Although this method produces excellent results on practical problems, the amount of computation grows linearly with the total number of possible transformations in the input.
A tangent-based construction of a deformation field may be used to model large deformations in an approximate manner. The tangent approximation can also be included in generative models, such as including linear factor analyzer models and nonlinear generative models. Another approach to modeling small deformations is to jointly cluster the data and learn a locally linear deformation model for each cluster, e.g., using EM in a factor analyzer. With the factor analysis approach, however, a large amount of data is needed to accurately model the deformations. Learning also is susceptible to local optima that might confuse deformed data from one cluster with data from another cluster. That is, some factors tend to “erase” parts of the image and “draw” new parts, instead of just perturbing the image.