Digital images are formed by many devices and used for many practical purposes. Devices include TV cameras operating on visible or infrared light, line-scan sensors, flying spot scanners, electron microscopes, X-ray devices including CT scanners, magnetic resonance imagers, and other devices known to those skilled in the art. Practical applications are found in industrial automation, medical diagnosis, satellite imaging for a variety of military, civilian, and scientific purposes, photographic processing, surveillance and traffic monitoring, document processing, and many others.
To serve these applications the images formed by the various devices are analyzed by digital devices to extract appropriate information. One form of analysis that is of considerable practical importance is determining the position, orientation, and size of patterns in an image that correspond to objects in the field of view of the imaging device. Pattern location methods are of particular importance in industrial automation, where they are used to guide robots and other automation equipment in semiconductor manufacturing, electronics assembly, pharmaceuticals, food processing, consumer goods manufacturing, and many others.
Another form of digital image analysis of practical importance is identifying differences between an image of an object and a stored pattern that represents the “ideal” appearance of the object. Methods for identifying these differences are generally referred to as pattern inspection methods, and are used in industrial automation for assembly, packaging, quality control, and many other purposes.
One early, widely-used method for pattern location and inspection is known as blob analysis. In this method, the pixels of a digital image are classified as “object” or “background” by some means, typically by comparing pixel gray-levels to a threshold. Pixels classified as object are grouped into blobs using the rule that two object pixels are part of the same blob if they are neighbors; this is known as connectivity analysis. For each such blob one determines properties such as area, perimeter, center of mass, principal moments of inertia, and principal axes of inertia. The position, orientation, and size of a blob is taken to be its center of mass, angle of first principal axis of inertia, and area, respectively. These and the other blob properties can be compared against a known ideal for proposes of inspection.
Blob analysis is relatively inexpensive to compute, allowing for fast operation on inexpensive hardware. It is reasonably accurate under ideal conditions, and well-suited to objects whose orientation and size are subject to change. One limitation is that accuracy can be severely degraded if some of the object is missing or occluded, or if unexpected extra features are present.
Another limitation is that the values available for inspection purposes represent coarse features of the object, and cannot be used to detect fine variations. The most severe limitation, however, is that except under limited and well-controlled conditions there is in general no reliable method for classifying pixels as object or background. These limitations forced developers to seek other methods for pattern location and inspection.
Another method that achieved early widespread use is binary template matching. In this method a training image is used that contains an example of the pattern to be located. The subset of the training image containing the example is thresholded to produce a binary pattern and then stored in a memory. At run-time, images are presented that contain the object to be found. The stored pattern is compared with like-sized subsets of the run-time image at all or selected positions, and the position that best matches the stored pattern is considered the position of the object. Degree of match at a given position of the pattern is simply the fraction of pattern pixels that match their corresponding image pixel.
Binary template matching does not depend on classifying image pixels as object or background, and so it can be applied to a much wider variety of problems than blob analysis. It also is much better able to tolerate missing or extra pattern features without severe loss of accuracy, and it is able to detect finer differences between the pattern and the object. One limitation, however, is that a binarization threshold is needed, which can be difficult to choose reliably in practice, particularly under conditions of poor signal-to-noise ratio or when illumination intensity or object contrast is subject to variation. Accuracy is typically limited to about one whole pixel due to the substantial loss of information associated with thresholding. Even more serious, however, is that binary template matching cannot measure object orientation and size. Furthermore, accuracy degrades rapidly with small variations in orientation and/or size, and if larger variations are expected the method cannot be used at all.
A significant improvement over binary template matching came with the advent of relatively inexpensive methods for the use of gray-level normalized correlation for pattern location and inspection. These methods are similar to binary template matching, except that no threshold is used so that the full range of image gray-levels are considered, and the degree of match becomes the correlation coefficient between the stored pattern and the image subset at a given position.
Since no binarization threshold is needed, and given the fundamental noise immunity of correlation, performance is not significantly compromised under conditions of poor signal-to-noise ratio or when illumination intensity or object contrast is subject to variation. Furthermore, since there is no loss of information due to thresholding, position accuracy down to about ¼ pixel is practical using well-known interpolation methods. The situation regarding orientation and size, however, is not much improved.
Another limitation of correlation methods is that in many applications object shading can vary locally and non-linearly across an object, resulting in poor correlation with the stored pattern and therefore failure to locate it. For example, in semiconductor fabrication the process step known as chemical mechanical planarization (CMP) results in radical, non-linear changes in pattern shading, which makes alignment using correlation impossible. As another example, in almost any application involving 3-dimensional objects, such as robot pick-and-place applications, shading will vary as a result of variations in angles of illumination incidence and reflection, and from shadows and mutual illumination. The effects are more severe for objects that exhibit significant specular reflection, particularly metals and plastics.
More recently, improvements to gray-level correlation have been developed that allow it to be used in applications where significant variation in orientation and/or size is expected. In these methods, the stored pattern is rotated and/or scaled by digital image re-sampling methods before being matched against the image. By matching over a range of angles, sizes, and x-y positions, one can locate an object in the corresponding multidimensional space. Note that such methods would not work well with binary template matching, due to the much more severe pixel quantization errors associated with binary images.
One problem with these methods is the severe computational cost, both of digital re-sampling and of searching a space with more than 2 dimensions. To manage this cost, the search methods break up the problem into two or more phases. The earliest phase uses a coarse, subsampled version of the pattern to cover the entire search space quickly and identify possible object locations. Subsequent phases use finer versions of the pattern to refine the positions determined at earlier phases, and eliminate positions that the finer resolution reveals are not well correlated with the pattern. Note that variations of these coarse-fine methods have also been used with binary template matching and the original two-dimensional correlation, but are even more important with the higher-dimensional search space.
Even with these techniques, however, the computational cost is still high, and the problems associated with non-linear variation in shading remain.
Another pattern location method in common use is known as the Generalized Hough Transform (GHT). This method traces its origins to U.S. Pat. No. 3,069,654 [Hough, P. V. C., 1962], which described a method for locating parameterized curves such as lines or conic sections. Subsequently the method was generalized to be able to locate essentially arbitrary patterns. As with the above template matching and correlation methods, the method is based on a trained pattern. Instead of using gray levels directly, however, the GHT method identifies points along object boundaries using well-known methods of edge detection. A large array of accumulators, called Hough space, is constructed, with one such accumulator for each position in the multidimensional space to be searched. Each edge point in the image corresponds to a surface of possible pattern positions in Hough space. For each such edge point, the accumulators along the corresponding surface are incremented. After all image edge points have been processed, the accumulator with the highest count is considered to be the multidimensional location of the pattern.
The general performance characteristics of GHT are very similar to correlation. Computational cost rises very rapidly with number of dimensions, and although coarse-fine methods have been developed to improve performance, practical applications beyond 2 dimensions are almost nonexistent.
The edge detection step of GHT generally reduces problems due to non-linear variations in object contrast, but introduces new problems. Use of edge detectors generally increases susceptibility to noise and defocus. For many objects the edges are not sharply defined enough for the edge detection step to yield reliable results. Furthermore, edge detection fundamentally requires a binarization step, where pixels are classified as “edge” or “not edge”, usually by a combination of thresholding and peak detection. Binarization, no matter what method is used, is always subject to uncertainty and misclassification, and will contribute failure modes to any method that requires it.
Terminology
The following terminology is used throughout the specification:                Object—Any physical or simulated object, or portion thereof, having characteristics that can be measured by an image forming device or simulated by a data processing device.        Image—A 2-dimensional function whose values correspond to physical characteristics of an object, such as brightness (radiant energy, reflected or otherwise), color, temperature, height above a reference plane, etc., and measured by any image-forming device, or whose values correspond to simulated characteristics of an object, and generated by any data processing device.        Brightness—The physical or simulated quantity represented by the values of an image, regardless of source.        Granularity—A selectable size (in units of distance) below which spatial variations in image brightness are increasingly attenuated, and below which therefore image features increasingly cannot be resolved. Granularity can be thought of as being related to resolution.        Boundary—An imaginary contour, open-ended or closed, straight or curved, smooth or sharp, along which a discontinuity of image brightness occurs at a specified granularity, the direction of said discontinuity being normal to the boundary at each point.        Gradient—A vector at a given point in an image giving the direction and magnitude of greatest change in brightness at a specified granularity at said point.        Pattern—A specific geometric arrangement of contours lying in a bounded subset of the plane of the contours, said contours representing the boundaries of an idealized image of an object to be located and/or inspected.        Model—A set of data encoding characteristics of a pattern to be found for use by a pattern finding method.        Training—The act of creating a model from an image of an example object or from a geometric description of an object or a pattern.        Pose—A mapping from pattern to image coordinates and representing a specific transformation and superposition of a pattern onto an image.        