The present invention relates to pattern recognition and, more particularly, to a system and method to facilitate pattern recognition by deformable matching.
Pattern recognition systems are employed in various areas of technology to help classify and/or match a test pattern relative to one or more known prototype patterns. Examples of pattern recognition applications include image analysis and classification, handwriting recognition, speech recognition, man and machine diagnostics, industrial inspection, medical imaging, etc.
In a pattern recognition system, it is common to store large amounts of data indicative of prototype patterns and compare them to a given example or unknown input symbol for identification. Several common algorithms may be utilized, such as K-nearest Neighbor (KNN), Parzen windows, and radial basis function (RBF), to compare or measure similarities between patterns. A level of similarity may be determined by generating a distance measure. By way of example, a simple algorithm for comparing two patterns f and g is to compute the Euclidean distance between them, such as may be expressed as:             d      E        ⁢          xe2x80x83        ⁢          (              f        ,        g            )        =                              ∑                      x            ,            y                                xe2x80x83                          ⁢                  xe2x80x83                ⁢                              (                                          f                ⁢                                  xe2x80x83                                ⁢                                  (                                      x                    ,                    y                                    )                                            -                              g                ⁢                                  xe2x80x83                                ⁢                                  (                                      x                    ,                    y                                    )                                                      )                    2                      =                            (                      f            -            g                    )                2            
where dE denotes the Euclidean distance between two patterns, and f and g are assumed to be two-dimensional patterns, indexed by x and y. An extension of the Euclidean distance methodology to other dimensions is straightforward.
The usefulness of the Euclidean distance algorithm is limited, however, because if f and g are not perfectly aligned, the Euclidean distance can yield arbitrarily large values. Consider, for instance, a case where g is a translated version of f, that is g(x,y)=f(x+1,y). In this case, the Euclidean distance could yield a very large value, even though f and g may be virtually identical except for a small amount of translation in the x-direction.
One proposed comparison scheme to remedy the aforementioned shortcoming associated with the traditional Euclidean distance approach is to employ a tangent distance, such as is disclosed in U.S. Pat. No. 5,422,961. This comparison scheme is invariant with respect to a selected set of small transformations of the prototypes. The small transformations of interest are expressed by calculating a derivative of the transformed image with respect to the parameter that controls the transformation. The directional derivative is used to generate a computational representation of the transformation of interest. The transformation of interest, which corresponds to a desired invariance, can be efficiently expressed by using tangent vectors constructed relative to the surface of transformation. The tangent distance dT may be expressed as:             d      T        ⁢          xe2x80x83        ⁢                  (                  f          ,          g                )            2        =            min                        α          f                ⁢                  xe2x80x83                ⁢                  α          g                      ⁢          xe2x80x83        ⁢                  (                  f          +                                    L              f                        ⁢                          xe2x80x83                        ⁢                          a              f                                -          g          -                                    L              g                        ⁢                          xe2x80x83                        ⁢                          a              g                                      )            2      
where Lf and Lg are matrices of tangent vectors for f and g respectively, and af and ag are vectors representing the amount of deformation along the tangent plane. An advantage of tangent distance compared to the traditional Euclidean distance approach is that the tangent distance is less affected by translation than the Euclidean distance because if Lf and Lg contain a linear approximation of the translation transformation, the tangent distance compares the translated version of f and g. The tangent distance concept is explored in greater detail in a paper entitled, xe2x80x9cEfficient Pattern Recognition Using a New Transformation Distance,xe2x80x9d presented by Patrice Y. Simard, Yann LeCun and John Denker, Advances in Neural Information Processing Systems, Eds. Morgan Kaufmann, pp. 50-58, 1993.
A limitation of tangent distance approach, however, is that the transformations to which it is invariant generally must be known a-priori and precisely (e.g., translation, rotation, scaling, etc.). Moreover, tangent distance has no mechanism to specify loose constraints such as small elastic displacements. Such mechanism would be useful because in many cases, such as with speech or image patterns, it is not known which transformations should be used, but it is assumed that the problem exhibit some invariance with respect to small elastic displacements.
A comparison scheme that is invariant to such transformations can operate with greater economy and speed than comparison schemes that require exhaustive sets of prototypes. By way of example, transformations of alphanumeric patterns that are of interest in this regard may include translation, rotation, scaling, hyperbolic deformations, line thickness changes, and gray-level changes. Any desired number of possible invariances can be included in any particular recognition process, provided that such invariances are known a priori, which is not always possible.
Many computer vision and image processing tasks benefit from invariances to spatial deformations in the image. Examples include handwritten character recognition, face recognition and motion estimation in video sequences. When the input images are subjected to possibly large transformations from a known finite set of transformations (e.g., translations in images), it is possible to model the transformations using a discrete latent variable and perform transformation-invariant clustering and dimensionality reduction using Expectation Maximization as in xe2x80x9cTopographic transformation as a discrete latent variablexe2x80x9d by Jojic and Frey presented at Neural Information Processing Systems (NIPS) 1999. Although this method produces excellent results on practical problems, the amount of computation grows linearly with the total number of possible transformations in the input.
A tangent-based construction of a deformation field may be used to model large deformations in an approximate manner. The tangent approximation can also be included in generative models, such as including linear factor analyzer models and nonlinear generative models. Another approach to modeling small deformations is to jointly cluster the data and learn a locally linear deformation model for each cluster, e.g., using EM in a factor analyzer. With the factor analysis approach, however, a large amount of data is needed to accurately model the deformations. Learning also is susceptible to local optima that might confuse deformed data from one cluster with data from another cluster. That is, some factors tend to xe2x80x9cerasexe2x80x9d parts of the image and xe2x80x9cdrawxe2x80x9d new parts, instead of just perturbing the image.
The present invention relates to a system and method to facilitate pattern recognition or matching between patterns. A deformation field is multiplied with a spatial derivative of a first pattern and a resulting deformation component is added to the first pattern to derive a first deformed pattern. The deformation field is generally non-uniform and may be parameterized by a smooth functional basis having associated deformation coefficients. By way of example, the functional basis may correspond to inverse wavelet transforms of varying frequency or resolution defined by the associated deformation coefficients. A minimum distance between the first deformed pattern and a second pattern may be determined by minimizing the distance between the first deformed pattern and the second pattern with respect to the deformation coefficients of the first deformed pattern. Advantageously, the foregoing minimization provides a system (e.g., linear) that may be solved with standard methods, such as quadratic optimization.
In accordance with one aspect of the present invention, the second pattern also may be deformed. A second deformed pattern is obtained by multiplying a second deformation field with a spatial derivative of the second pattern and by adding the resulting deformation component to the second pattern. The second deformation field is generally non-uniform and may be parameterized by a second smooth functional basis having associated deformation coefficients. By way of example, the second functional basis may correspond to inverse wavelet transforms of varying frequency or resolution defined by the associated deformation coefficients. When the second pattern is deformed, the minimum distance between the first and second deformed patterns may be determined by minimizing the distance between the deformed patterns with respect to their associated deformation coefficients. Advantageously, the foregoing is a standard quadratic optimization problem which reduces to solving a linear system of equations
One aspect of the present invention provides a method to facilitate pattern recognition between a first pattern and a second pattern. A substantially smooth deformation field having deformation coefficients is applied to the first pattern to deform the first pattern and form a deformation component of the first pattern. The first pattern and the deformation component of the first pattern are aggregated to define a first deformed pattern. A distance between the first deformed pattern and the second pattern is determined as a function of the deformation coefficients for the first pattern.
Another aspect of the present invention provides a method to facilitate pattern recognition between a first pattern and a second pattern. A substantially smooth deformation field having deformation coefficients is applied to deform the first pattern and form a first deformation component. The first deformation component is added to the first pattern to define a first deformed pattern. The deformation coefficients for the first pattern are determined by minimizing a distance between the first deformed pattern and the second pattern. The determined deformation coefficients may then be employed to evaluate a distance between the first deformed pattern and the second pattern.
In accordance with yet another aspect of the present invention, the second pattern also may be deformed by applying a second deformation field to the second pattern to form a second deformation component. The second deformation component is added to the second pattern to define a second deformed pattern. The second deformation field is parameterized by functional basis vectors and deformation coefficients associated with the second pattern. A distance between the first deformed pattern and the second deformed pattern may then be determined as a function of the respective deformation coefficients.
Still another aspect of the present invention provides a system to measure similarity between first and second patterns. The system includes a first deformation system for applying a substantially smooth deformation field to deform the first pattern and provide a first deformation component. An aggregation system aggregates the first deformation component and the first pattern to derive a first deformed pattern. A minimization system is operable to minimize a distance between the first deformed pattern and the second pattern as a function of the deformation field.
To the accomplishment of the foregoing and related ends, certain illustrative aspects of the invention are described herein in connection with the following description and the annexed drawings. These aspects are indicative, however, of but a few of the various ways in which the principles of the invention may be employed and the present invention is intended to include all such aspects and their equivalents. Other advantages and novel features of the invention will become apparent from the following detailed description of the invention when considered in conjunction with the drawings.