Digital images are formed by many devices and used for many practical purposes. Devices include TV cameras operating on visible or infrared light, line-scan sensors, flying spot scanners, electron microscopes, X-ray devices including CT scanners, magnetic resonance imagers, and other devices known to those skilled in the art. Practical applications are found in industrial automation, medical diagnosis, satellite imaging for a variety of military, civilian, and scientific purposes, photographic processing, surveillance and traffic monitoring, document processing, and many others.
To serve these applications the images formed by the various devices are analyzed by digital devices to extract appropriate information. One form of analysis that is of considerable practical importance is determining the position, orientation, and size of patterns in an image that correspond to objects in the field of view of the imaging device. Pattern location methods are of particular importance in industrial automation, where they are used to guide robots and other automation equipment in semiconductor manufacturing, electronics assembly, pharmaceuticals, food processing, consumer goods manufacturing, and many others.
Another form of digital image analysis of practical importance is identifying differences between an image of an object and a stored pattern that represents the “ideal” appearance of the object. Methods for identifying these differences are generally referred to as pattern inspection methods, and are used in industrial automation for assembly, packaging, quality control, and many other purposes.
One early, widely-used method for pattern location and inspection is known as blob analysis. In this method, the pixels of a digital image are classified as “object” or “background” by some means, typically by comparing pixel gray-levels to a threshold. Pixels classified as object are grouped into blobs using the rule that two object pixels are part of the same blob if they are neighbors; this is known as connectivity analysis. For each such blob we determine properties such as area, perimeter, center of mass, principal moments of inertia, and principal axes of inertia. The position, orientation, and size of a blob is taken to be its center of mass, angle of first principal axis of inertia, and area, respectively. These and the other blob properties can be compared against a known ideal for proposes of inspection.
Blob analysis is relatively inexpensive to compute, allowing for fast operation on inexpensive hardware. It is reasonably accurate under ideal conditions, and well-suited to objects whose orientation and size are subject to change. One limitation is that accuracy can be severely degraded if some of the object is missing or occluded, or if unexpected extra features are present. Another limitation is that the values available for inspection purposes represent coarse features of the object, and cannot be used to detect fine variations. The most severe limitation, however, is that except under limited and well-controlled conditions there is in general no reliable method for classifying pixels as object or background. These limitations forced developers to seek other methods for pattern location and inspection.
Another method that achieved early widespread use is binary template matching. In this method a training image is used that contains an example of the pattern to be located. The subset of the training image containing the example is thresholded to produce a binary pattern and then stored in a memory. At run-time, images are presented that contain the object to be found. The stored pattern is compared with like-sized subsets of the run-time image at all or selected positions, and the position that best matches the stored pattern is considered the position of the object. Degree of match at a given position of the pattern is simply the fraction of pattern pixels that match their corresponding image pixel, thereby providing pattern inspection information.
Binary template matching does not depend on classifying image pixels as object or background, and so it can be applied to a much wider variety of problems than blob analysis. It also is much better able to tolerate missing or extra pattern features without severe loss of accuracy, and it is able to detect finer differences between the pattern and the object. One limitation, however, is that a binarization threshold is needed, which can be difficult to choose reliably in practice, particularly under conditions of poor signal-to-noise ratio or when illumination intensity or object contrast is subject to variation. Accuracy is typically limited to about one whole pixel due to the substantial loss of information associated with thresholding. Even more serious, however, is that binary template matching cannot measure object orientation and size. Furthermore, accuracy degrades rapidly with small variations in orientation and/or size, and if larger variations are expected the method cannot be used at all.
A significant improvement over binary template matching came with the advent of relatively inexpensive methods for the use of gray-level normalized correlation for pattern location and inspection. The methods are similar, except that no threshold is used so that the full range of image gray-levels are considered, and the degree of match becomes the correlation coefficient between the stored pattern and the image subset at a given position.
Since no binarization threshold is needed, and given the fundamental noise immunity of correlation, performance is not significantly compromised under conditions of poor signal-to-noise ratio or when illumination intensity or object contrast is subject to variation. Furthermore, since there is no loss of information due to thresholding, position accuracy down to about ¼ pixel is practical using well-known interpolation methods. The situation regarding orientation and size, however, is not much improved with respect to binary template matching. Another limitation is that in some applications, contrast can vary locally across an image of an object, resulting in poor correlation with the stored pattern, and consequent failure to correctly locate it.
More recently, improvements to gray-level correlation have been developed that allow it to be used in applications where significant variation in orientation and/or size is expected. In these methods, the stored pattern is rotated and/or scaled by digital image re-sampling methods before being matched against the image. By matching over a range of angles, sizes, and x-y positions, one can locate an object in the corresponding multidimensional space. Note that such methods would not work well with binary template matching, due the much more severe pixel quantization errors associated with binary images.
One problem with these methods is the severe computational cost, both of digital re-sampling and of searching a space with more than 2 dimensions. To manage this cost, the search methods break up the problem into two or more phases. The earliest phase uses a coarse, subsampled version of the pattern to cover the entire search space quickly and identify possible object locations in the n-dimensional space. Subsequent phases use finer versions of the pattern to refine the locations determined at earlier phases, and eliminate locations that the finer resolution reveals are not well correlated with the pattern. Note that variations of these coarse-fine methods have also been used with binary template matching and the original 2-dimensional correlation, but are even more important with the higher-dimensional search space.
The location accuracy of these methods is limited both by how finely the multidimensional space is searched, and by the ability of the discrete pixel grid to represent small changes in position, orientation, and scale. The fineness of the search can be chosen to suit a given application, but computational cost grows so rapidly with resolution and number of dimensions that practical applications often cannot tolerate the cost or time needed to achieve high accuracy. The limitations of the discrete pixel grid are more fundamental—no matter how finely the space is searched, for typical patterns one cannot expect position accuracy to be much better than about ¼ pixel, orientation better than a degree or so, and scale better than a percent or so.
A similar situation holds when gray-level pixel-grid-based methods are used for pattern inspection. Once the object has been located in the multidimensional space, pixels in the pattern can be compared to each corresponding pixel in the image to identify differences. Some differences, however, will result from the re-sampling process itself, because again the pixel grid cannot accurately represent small variations in orientation and scale. These differences are particularly severe in regions where image gray levels are changing rapidly, such as along object boundaries. Often these are the most important regions of an object to inspect. Since in general, differences due to re-sampling cannot be distinguished from those due to object defects, inspection performance is compromised.
Another pattern location method in common use is known as the Generalized Hough Transform (GHT). This method traces its origins to U.S. Pat. No. 3,069,654 [Hough, P. V. C., 1962], which describes a method for locating parameterized curves such as lines or conic sections. Subsequently the method was generalized to be able to locate essentially arbitrary patterns. As with the above template matching and correlation methods, the method is based on a trained pattern. Instead of using gray levels directly, however, the GHT method identifies points along object boundaries using well-known methods of edge detection. A large array of accumulators, called Hough space, is constructed, with one such accumulator for each position in the multidimensional space to be searched. Each edge point in the image corresponds to a surface of possible pattern positions in Hough space. For each such edge point, the accumulators along the corresponding surface are incremented. After all image edge points have been processed, the accumulator with the highest count is considered to be the multidimensional location of the pattern.
The general performance characteristics of GHT are very similar to correlation. Computational cost rises very rapidly with number of dimensions, and accuracy is limited both by fineness of the Hough space and grid quantization effects. Coarse-fine methods have been developed to improve performance of GHT, but are computationally expensive at high accuracy. The edge detection module generally eliminates problems due to local variations in object contrast, but increases susceptibility to noise.