Digital image recognition systems are known which attempt to locate the position of a digital reference image template within a larger digital search image scene. Such digital images are comprised of a series of pixels arranged in a matrix, wherein a grayscale value is attributed to each pixel to indicate the appearance thereof. Matching is then performed by comparing these grayscale values relative to their positions in both the digital reference image template and the digital search image scene. A match is found when the same or similar pattern is found in the digital search image scene as in the digital reference image template.
Such systems are typically implemented in a computer for use in various manufacturing and robotic applications. For example, such systems can be utilized to automate tasks such as semiconductor wafer handling operations, fiducial recognition for pick-and-place printed circuit board (PCB) assembly, machine vision for quantification or system control to assist in location of objects on conveyor belts, pallets, and trays, and automated recognition of printed matter to be inspected, such as alignment marks.
The matrix of pixels used to represent such digital images are typically arranged in a Cartesian coordinate system or other arrangement of non-rectangular pixels, such as hexagonal or diamond shaped pixels. Recognition methods usually require scanning the search image scene pixel by pixel in comparison with the reference image template which is sought. Further, known search techniques allow for transformations such as rotation and scaling of the reference image template within the search image scene, therefore requiring the recognition method to accommodate for such transformations.
As such recognition methods typically scan an image pixel by pixel, and perform a series of transformations of the reference image template, the number of computational operations tends to increase exponentially with the number of elements, or pixels, involved. Such quantification is typically expressed as O(n.sup.x), where x is the power to which the number of computational operations increases based on the number of elements. For example, a sorting method which sorts a list by iterating through the list and comparing it to every other element in the list would be O(n.sup.2), since sorting of 4 elements requires 16 comparison operations, while sorting of 10 elements requires 10.sup.2, or 100, comparison operations.
As digital image recognition methods tend to require scanning of every pixel in a reference image template with respect to every pixel in a search image scene, the number of operations indicated by O(n.sup.x) becomes significant. Further, since transformations such as rotation and scaling must be repeated for each such pixel scan, O(n.sup.x) is further increased. As an increased number of pixels increases resolution and produces better visual image quality, it is desirable to accommodate a large number of pixels.
Normalized grayscale correlation (NGC) has been used to match digital images reliably and accurately, as is disclosed in U.S. Pat. No. 5,602,937, entitled "Methods and Apparatus for Machine Vision High Accuracy Searching," assigned to Cognex Corporation. The traditional NGC, however, while effective at detecting linear changes in grayscale, has very little tolerance to changes in other aspects of digital images, such as rotation, scale, perspective, distortion, defocus, and non-traditional grayscale changes. In addition, NGC is computationally very expensive, being on the order of (O(n.sup.4)), since every pixel in the reference image template needs to be correlated with every pixel in the search image scene.
Following is a general notation for correlation image matching. Let t(x,y), 1.ltoreq.x.ltoreq.X.sub.t, 1.ltoreq.y.ltoreq.Y.sub.t be the rectangular template to be localized within a larger scene s(x,y). Then the correlation R(i) for a set of N transformations of the scene s.sub.i (x,y), 1.ltoreq.i.ltoreq.N, wherein s.sub.i (x,y) can be a translation, rotation, scaling, or other transformation of s(x,y), can be written as EQU R(i)=f(t(x,y), s.sub.i (x,y))
where f( ) denotes the correlation function. The most common transformation in template matching is the translation along the x and y directions, or axes. In this case, the displacements s(x+u,y+v) of a symmetric search range -U.ltoreq.u.ltoreq.U, -V.ltoreq.v.ltoreq.V correspond to N=(2U+1)*(2V+1) transformations s.sub.i (x,y), 1.ltoreq.i.ltoreq.N.
Various approaches have been attempted to speed up conventional NGC, such as faster hardware utilizing pipeline image processing, RISC processors, and faster memory, which allow processing of more pixels per second. Such a horsepower driven approach, however, does not change the (O(n.sup.4)) computational metric of NGC.
Another method used to reduce the computational metrics of grayscale correlation is to employ an image pyramid. An image pyramid stores multiple copies of a digital image in a sequence which varies pixel density, and therefore resolution, at each level in the sequence. In this approach, a coarse match is found at the top of the pyramid, and a hill climbing strategy is utilized to traverse through the successive levels of the image pyramid. This approach significantly reduces the number of pixels used in correlation. While effective at improving performance for course matching, such a method must still encompass all the pixels in the reference image template against all pixels in the search image scene.
Yet another strategy is sparse correlation. While the traditional NGC approach applies correlation to every pixel in the reference image template, sparse correlation selects a subset of such pixels for correlation. Each correlation function f( ) incorporates summations .SIGMA..sub.xy with respect to the x and y axes. For conventional correlation, the summations .SIGMA..sub.xy for N correlations run over the entire template in an exhaustive search, hence ##EQU1##
For sparse correlation, however, summations are computed only over a predefined set of K pixels P={(x.sub.1,y.sub.1), (x.sub.2, y.sub.2), . . . , (x.sub.K,y.sub.K)}, rather than over an exhaustive set of all reference image template pixels; hence: ##EQU2##
Since K is much smaller than the total number of pixels in the template, this leads to a significant reduction of computational cost.
Several strategies for choosing a subset have been utilized, such as skipping every other pixel and choosing random pixels. However, these approaches significantly deteriorate the effectiveness of correlation and the resultant matching accuracy.
A version of sparse correlation called point correlation has been proposed (Krattenthaler et al. 1994), where matching is performed with a pre-computed set of sparse points of the reference image template. In this method, a set of correlation sensitive pixels is selected, wherein a pixel of the template is considered to be well suited for point correlation if its influence on template matching is higher than the score of other pixels. This set of correlation sensitive pixels is selected in a learning session during which the template is shifted, rotated, and scaled through a predetermined set of possible combinations.
This learning procedure is outlined as follows. Compute a point set P.sub.M with M initial points by randomly selecting a couple of points, preferably on the edges. Iterate through the pixels in the reference image template to build P.sub.L, initially equal to P.sub.M :
Assume we have already computed a sparse point set P.sub.L consisting of L points. Then, find the new set P.sub.L+1 in the following way: PA0 1. For each point X.sub.j =(x.sub.i,y.sub.i) in the template with X.sub.j.epsilon slash.P.sub.L PA0 Compute the correlation result R.sub.j (i) for all transformations i, 1.ltoreq.i.ltoreq.N, using point correlation with the set of points P.sub.L.orgate.X.sub.j. PA0 Compute a correlation measure Cm.sub.j of the correlation result R.sub.j (i) that determines the quality of the point X.sub.j. PA0 2. Choose the point X.sub.j to be the new element of sparse point set P.sub.L+1 whose correlation measure Cm.sub.j is an extremum.
While such a learning procedure improves the performance of the subsequent search, such a procedure is nonetheless computationally expensive. Given that there are O(n) possible combinations (rotations, scale, and others), then N in step 1 would be of O(n.sup.3). If the number of template pixels is of O(n.sup.2), then step one would require O(n.sup.5) computations. To select n pixels, the required number of computations would be: EQU (1+2+3+ . . . +n)*O(n.sup.5), which is O(n.sup.7).
To select O(n.sup.2) pixels, therefore, the computational complexity would be O(n.sup.8).
However, even with the power of modern processors, O(n.sup.8) is practically infeasible for high resolution images having a large number of pixels. It is therefore desirable to determine an optimal sparse pixel set of correlation sensitive pixels. If the chosen set of sparse pixels is too small, accuracy will be compromised. Conversely, an excessive number of correlation sensitive pixels degrades performance. One method is to simply accumulate a fixed number of points in the sparse point set, however such an approach is not adaptive to various combinations of reference image templates and search image scenes.
It would be beneficial, therefore, to develop a method for computing an optimal sparse pixel set for grayscale correlation matching which is tolerant of changes in rotation, scale, perspective, brightness, and focus, and which is sufficiently fast that it can be implemented in software without requiring dedicated image processing hardware, and which nonetheless maintains a level of accuracy comparable to conventional, exhaustive NGC.