This invention relates to machine vision, and particularly to systems for pattern inspection in an image.
Digital images are formed by many devices and used for many practical purposes. Devices include TV cameras operating on visible or infrared light, line-scan sensors, flying spot scanners, electron microscopes, X-ray devices including CT scanners, magnetic resonance imagers, and other devices known to those skilled in the art. Practical applications are found in industrial automation, medical diagnosis, satellite imaging for a variety of military, civilian, and scientific purposes, photographic processing, surveillance and traffic monitoring, document processing, and many others.
To serve these applications the images formed by the various devices are analyzed by digital devices to extract appropriate information. One form of analysis that is of considerable practical importance is determining the position, orientation, and size of patterns in an image that correspond to objects in the field of view of the imaging device. Pattern location methods are of particular importance in industrial automation, where they are used to guide robots and other automation equipment in semiconductor manufacturing, electronics assembly, pharmaceuticals, food processing, consumer goods manufacturing, and many others.
Another form of digital image analysis of practical importance is identifying differences between an image of an object and a stored pattern that represents the xe2x80x9cidealxe2x80x9d appearance of the object. Methods for identifying these differences are generally referred to as pattern inspection methods, and are used in industrial automation for assemblyv packaging, quality control, and many other purposes.
One early, widelv-used method for pattern location and inspection is known as blob analysis. In this method, the pixels of a digital image are classified as xe2x80x9cobjectxe2x80x9d or xe2x80x9cbackgroundxe2x80x9d by some means, typically by comparing pixel gray-levels to a threshold. Pixels classified as object are grouped into blobs using the rule that two object pixels are part of the same blob if they are neighbors; this is known as connectivity analysis. For each such blob we determine properties such as area, perimeter, center of mass, principal moments of inertia, and principal axes of inertia. The position, orientation, and size of a blob is taken to be its center of mass, angle of first principal axis of inertia, and area, respectively. These and the other blob properties can be compared against a known ideal for proposes of inspection.
Blob analysis is relatively inexpensive to compute, allowing for fast operation on inexpensive hardware. It is reasonably accurate under ideal conditions, and well-suited to objects whose orientation and size are subject to change. One limitation is that accuracy can be severely degraded if some of the object is missing or occluded, or if unexpected extra features are present. Another limitation is that the values available for inspection purposes represent coarse features of the object, and cannot be used to detect fine variations. The most severe limitation, however, is that except under limited and well-controlled conditions there is in general no reliable method for classifying pixels as object or background. These limitations forced developers to seek other methods for pattern location and inspection.
Another method that achieved early widespread use is binary template matching. In this method, a training image is used that contains an example of the pattern to be located. The subset of the training image containing the example is thresholded to produce a binary pattern and then stored in a memory. At run-time, images are presented that contain the object to be found. The stored pattern is compared with like-sized subsets of the run-time image at all or selected positions, and the position that best matches the stored pattern is considered the position of the object. Degree of match at a given position of the pattern is simply the fraction of pattern pixels that match their corresponding image pixel, thereby providing pattern inspection information.
Binary template matching does not depend on classifying image pixels as object or background, and so it can be applied to a much wider variety of problems than blob analysis. It also is much better able to tolerate missing or extra pattern features without severe loss of accuracy, and it is able to detect finer differences between the pattern and the object. One limitation, however, is that a binarization threshold is needed, which can be difficult to choose reliably in practice, particularly under conditions of poor signal-to-noise ratio or when illumination intensity or object contrast is subject to variation. Accuracy is typically limited to about one whole pixel due to the substantial loss of information associated with thresholding. Even more serious, however, is that binary template matching cannot measure object orientation and size. Furthermore, accuracy degrades rapidly with small variations in orientation and/or size, and if larger variations are expected the method cannot be used at all.
A significant improvement over binary template matching came with the advent of relatively inexpensive methods for the use of gray-level normalized correlation for pattern location and inspection. The methods are similar, except that no threshold is used so that the full range of image gray-levels are considered, and the degree of match becomes the correlation coefficient between the stored pattern and the image subset at a given position.
Since no binarization threshold is needed, and given the fundamental noise immunity of correlation, performance is not significantly compromised under conditions of poor signal-to-noise ratio or when illumination intensity or object contrast is subject to variation. Furthermore, since there is no loss of information due to thresholding, position accuracy down to about xc2xc pixel is practical using well-known interpolation methods. The situation regarding orientation and size, however, is not much improved with respect to binary template matching. Another limitation is that in some applications, contrast can vary locally across an image of an object, resulting in poor correlation with the stored pattern, and consequent failure to correctly locate it.
More recently, improvements to gray-level correlation have been developed that allow it to be used in applications where significant variation in orientation and/or size is expected. In these methods, the stored pattern is rotated and/or scaled by digital image re-sampling methods before being matched against the image. By matching over a range of angles, sizes, and x-y positions, one can locate an object in the corresponding multidimensional space. Note that such methods would not work well with binary template matching, due the much more severe pixel quantization errors associated with binary images. One problem with these methods is the severe computational cost, both of digital re-sampling and of searching a space with more than 2 dimensions. To manage this cost, the search methods break up the problem into two or more phases. The earliest phase uses a coarse, subsampled version of the pattern to cover the entire search space quickly and identify possible object locations in the n-dimensional space. Subsequent phases use finer versions of the pattern to refine the locations determined at earlier phases, and eliminate locations that the finer resolution reveals are not well correlated with the pattern. Note that variations of these coarse-fine methods have also been used with binary template matching and the original 2-dimensional correlation, but are even more important with the higher-dimensional search space.
The location accuracy of these methods is limited both by how finely the multidimensional space is searched, and by the ability of the discrete pixel grid to represent small changes in position, orientation, and scale. The fineness of the search can be chosen to suit a given application, but computational cost grows so rapidly with resolution and number of dimensions that practical applications often cannot tolerate the cost or time needed to achieve high accuracy. The limitations of the discrete pixel grid are more fundamentalxe2x80x94no matter how finely the space is searched, for typical patterns one cannot expect position accuracy to be much better than about xc2xc pixel, orientation better than a degree or so, and scale better than a percent or so.
A similar situation holds when gray-level pixel-grid-based methods are used for pattern inspection. Once the object has been located in the multidimensional space, pixels in the pattern can be compared to each corresponding pixel in the image to identify differences. Some differences, however, will result from the re-sampling process itself, because again the pixel grid cannot accurately represent small variations in orientation and scale. These differences are particularly severe in regions where image gray levels are changing rapidly, such as along object boundaries. Often these are the most important regions of an object to inspect. Since in general, differences due to re-sampling cannot be distinguished from those due to object defects, inspection performance is compromised.
In one general aspect, the invention is a method and apparatus for identifying differences between a stored pattern and a matching image subset, where variations in pattern position, orientation, and size do not give rise to false differences. The process of identifying differences is called inspection. Generally, an object image must be precisely located prior to inspection. In another general aspect, the invention is a system for analyzing an object image with respect to a model pattern, wherein the system includes extracting pattern features from the model pattern; generating a vector-valued function using the pattern features to provide a pattern field; extracting image features from the object image; evaluating each image feature, using the pattern field and an n-dimensional transformation that associates image features with pattern features, so as to determine at least one associated feature characteristic; and using at least one feature characteristic to identify at least one flaw in the object image. In a preferred embodiment, at least one associated feature characteristic includes a probability value that indicates the likelihood that an associated image feature does not correspond to a feature in the model pattern. In an alternate preferred embodiment, the at least one associated feature characteristic includes a probability value that indicates the likelihood that an associated image feature does correspond to a feature in the model pattern.
In another preferred embodiment, at least one pattern feature includes a probability value indicating the likelihood that the pattern feature does not correspond to at least one feature in the object image.
When using at least one feature characteristic, it is preferred to transfer a feature characteristic from the at least one image feature to an element of the pattern field, where in a preferred embodiment, the element of the pattern field is the nearest element of the pattern field.
When using at least one feature characteristic, it is also preferred to use a plurality of image features; and transfer a plurality of the feature characteristics from the plurality of image features to a plurality of elements of the pattern field, wherein some of the plurality of elements of the pattern field can include at least one link to a neighboring element of the pattern field. Further, after transferring a plurality of the feature characteristics from the plurality of image features to a plurality of elements of the pattern field, it is preferred that each element of the plurality of elements of the pattern field receive a feature characteristic equal to the maximum of its own feature characteristic and the feature characteristic of each neighboring element of the pattern field.
In another preferred embodiment, to use at least one feature characteristic includes identifying the nearest element of the pattern field; transferring a feature characteristic from the at least one image feature to the nearest element of the pattern field; and computing a coverage value using at least the transferred feature characteristic.
In a further preferred embodiment, evaluating each image feature includes comparing the direction of each image feature with the direction of an element of the pattern field. It is preferable to assign a higher weight to the image feature if the difference in the direction of the image feature from the direction of an element of the pattern field is less than a specified direction parameter. The specified direction parameter can be determined by a characteristic of the element of the pattern field, such as a flag indicating xe2x80x9ccornerxe2x80x9d or xe2x80x9cnon-cornerxe2x80x9d.
It is also possible for a lower weight to be assigned to the image feature if the difference in the direction of the image feature from the direction of an element of the pattern field is greater than a specified direction parameter, wherein the specified direction parameter can be determined by a characteristic of the element of the pattern field, such as a flag indicating xe2x80x9ccornerxe2x80x9d or xe2x80x9cnon-cornerxe2x80x9d.
In yet another preferred embodiment, to evaluate each image feature includes comparing, modulo 180 degrees, the direction of each image feature with the direction of an element of the pattern field.
It is also possible that to evaluate each image feature includes assigning a weight of zero when the image feature is at a position that corresponds to an element of the pattern field that specified that no image feature is expected at that position.
Moreover, to evaluate each image feature can include comparing the distance of each image feature with a specified distance parameter, where a lower weight can be assigned to the image feature if the distance of the image feature is greater than a specified distance parameter, or alternatively, where a higher weight can be assigned to the image feature if the distance of the image feature is less than a specified distance parameter.
In another preferred embodiment, to evaluate each image feature includes comparing the direction of each image feature with the direction of an element of the pattern field, and comparing the distance of each image feature with a specified distance parameter.
To avoid ambiguity we will call the location of a pattern in a multidimensional space its pose. More precisely, a pose is a coordinate transform that maps points in an image to corresponding points in a stored pattern. In a preferred embodiment, a pose is a general six degree of freedom linear coordinate transform. The six degrees of freedom can be represented by the four elements of a 2xc3x972 matrix, plus the two elements of a vector corresponding to the two translational degrees of freedom. Alternatively and equivalently, the four non-translational degrees of freedom can be represented in other ways, such as orientation, scale, aspect ratio, and skew, or x-scale, y-scale, x-axis-angle, and y-axis-angle.
The invention can serve as a replacement for the fine resolution phase of any coarse-fine method for pattern inspection, such as the prior art method of correlation search followed by Golden Template Analysis. In combination with the coarse location phases of any such method, the invention results in an overall method for pattern inspection that is faster and more accurate than any known prior art method.
In a preferred embodiment, the PatQuick(trademark) tool, sold by Cognex Corporation, Natick Mass., is used for producing an approximate object pose.
The invention uses a stored pattern that represents an ideal example of the object to be found and inspected. The pattern can be created from a training image or synthesized from a geometric description. According to the invention, patterns and images are represented by a feature-based description that can be translated, rotated, and scaled to arbitrary precision much faster than digital image re-sampling, and without pixel grid quantization errors. Thus accuracy is not limited by the ability of a grid to represent small changes in position, orientation, or size (or other degrees of freedom). Furthermore, pixel quantization errors due to digital re-sampling will not cause false differences between the pattern and image that can limit inspection performance, since no re-sampling is done.
Accuracy is also not limited by the fineness with which the space is searched, because the invention does not test discrete positions within the space to determine the pose with the highest degree of match. Instead the invention determines an accurate object pose from an approximate starting pose in a small, fixed number of increments that is independent of the number of dimensions of the space (i.e. degrees of freedom) and independent of the distance between the starting and final poses, as long as the starting pose is within some xe2x80x9ccapture rangexe2x80x9d of the true pose. Thus one does not need to sacrifice accuracy in order to keep execution time within the bounds allowed by practical applications.
Unlike prior art methods where execution time grows rapidly with number of degrees of freedom, with the method of the invention execution time grows at worst very slowly, and in some embodiments not at all. Thus one need not sacrifice degree of freedom measurements in order to keep execution time within practical bounds. Furthermore, allowing four or more degrees of freedom to be refined will often result in better matches between the pattern and image, and thereby improved accuracy.
The invention processes images with a feature detector to generate a description that is not tied to a pixel grid. The description is a list of elements called dipoles (also called features) that represent points (positions) along object boundaries. A dipole includes the coordinates of the position of a point along an object boundary and a direction pointing substantially normal to the boundary at that point. Object boundaries are defined as places where image gradient (a vector describing rate and direction of gray-level change at each point in an image) reaches a local maximum. In a preferred embodiment, gradient is estimated at an adjustable spatial resolution. In another preferred embodiment, the dipole direction is the gradient direction. In another preferred embodiment, a dipole, i.e., a feature, contains additional information as further described in the drawings. In yet another preferred embodiment, dipoles are generated not from an image but from a geometric description of an object, such as might be found in a CAD system.
The stored model pattern to be used by the invention for localization and subsequent inspection is the basis for generating a dipole list that describes the objects to be found by representing object boundaries. The dipole list derived from the model pattern is called the field dipole list. It can be generated from a model training image containing an example object using a feature detector. or it can be synthesized from a geometric description. The field dipole list is used to generate a 2-dimensional vector-valued function called a field. For each point within the region of the stored model pattern the field gives a vector that indicates the distance and direction to the nearest point along a model object boundary. The vector is called the force at the specified point within the stored model pattern.
Note that the nearest point along a model object boundary is not necessarily one of the model object boundary points represented by the field dipoles, but in general may lie between field dipole positions. Note further that the point within the stored model pattern is not necessarily an integer grid position, but is in general a real-valued position, known to within the limits of precision of the apparatus used to perform the calculations. Note that since the force vector points to the nearest boundary point, it must be normal to the boundary (except at discontinuities).
In a preferred embodiment, if no model object boundary point lies within a certain range of a field position, then a special code is given instead of a force vector. In another preferred embodiment, the identity of the nearest field dipole is given in addition to the force. In another preferred embodiment, one additional bit of information is given that indicates whether the gradient direction at the boundary pointed to by the force is the same or 180xc2x0 opposite from the force direction (both are normal to the boundary). In another preferred embodiment, additional information is given as further described in the drawings. In another embodiment, the field takes a direction in addition to a position within the pattern, and the force returned is the distance and direction to the nearest model object boundary point in approximately the given direction.
The stored model pattern used by the invention includes the field dipole list, the field, and a set of operating parameters as appropriate to a given embodiment, and further described throughout the specification.
Given an object image and an approximate starting pose, pattern localization proceeds as follows. The object image is processed by a feature detector to produce a dipole list, called the image dipole list. The starting pose is refined in a sequence of incremental improvements called attraction steps. Each such step results in a significantly more accurate pose in all of the degrees of freedom that are allowed to vary. The sequence can be terminated after a fixed number of steps, and/or when no significant change in pose results from the last step, or based on any reasonable criteria. In a preferred embodiment, the sequence is terminated after four steps.
For each attraction step, the image dipoles are processed in any convenient order. The position and direction of each image dipole is mapped by the current pose transformation to convert image coordinates to model pattern (field) coordinates. The field is used to determine the force at the point to which the image dipole was mapped. Since each image dipole is presumed to be located on an object boundary, and the force gives the distance and direction to the nearest model object boundary of the stored model pattern, the existence of the image dipole at the mapped position is taken as evidence that the pose should be modified so that the image dipole moves in the force direction by an amount equal to the force distance.
It is important to note that object boundaries generally provide position information in a direction normal to the boundary, which as noted above is the force direction, but no information in a direction along the boundary. Thus the evidence provided by an image dipole constrains a single degree of freedom only, specifically position along the line of force, and provides no evidence in the direction normal to the force.
If the current pose is a fair approximation to the true object position, then many image dipoles will provide good evidence as to how the pose should be modified to bring the image boundaries into maximum agreement with the boundaries of the stored model pattern. For a variety of reasons, however, many other image dipoles may provide false or misleading evidence. Thus, it is important to evaluate the evidence provided by each image dipole, and assign a weighting factor to each image dipole to indicate the relative reliability of the evidence.
In one embodiment, the direction (as mapped to the pattern coordinate system) of an image dipole is compared with the force direction, and the result, modulo 180xc2x0, is used to determine the weight of the image dipole. If the directions agree to within some specified parameter, the dipole is given a high weight; if they disagree beyond some other specified parameter, the dipole is given zero weight; if the direction difference falls between the two parameters, intermediate weights are assigned.
In another embodiment, the image dipole direction is compared to the gradient direction of the model pattern boundary to which the force points. A parameter is used to choose between which are pattern features for which no corresponding image feature can be found, and extra features, sometimes called xe2x80x9cclutterxe2x80x9d, which are image features that correspond to no pattern feature. In one embodiment, image dipoles with low weights are considered to be clutter. In a preferred embodiment, a specific clutter value is computed for each image dipole, as further described in the drawings below.
In an embodiment of the invention that can identify missing pattern features, the field at each point gives identity of the nearest field dipole, if any, in addition to the force vector. Each field dipole contains an evaluation. which is initialized to zero. Each image dipole transfers its evaluation (also called xe2x80x9cweightxe2x80x9d or xe2x80x9cfeature characteristicxe2x80x9d) to that of the nearest field dipole as indicated by the field. Since in general the correspondence between image and field dipoles is not one-to-one, some field dipoles may receive evaluations (feature characteristics) from more than one image dipole, and others may receive evaluations from none. Those field dipoles that receive no evaluation may represent truly missing features, or may simply represent gaps in the transfer due to quantization effects.
When more than one evaluation is transferred to a given field dipole, the evaluations can be combined by any reasonable means. In a preferred embodiment, the largest such evaluation is used and the others are discarded. Gaps in the transfer can be closed by considering neighboring field dipoles. In one embodiment, methods known in the art as gray-level mathematical morphology are used to close the gaps. In the case of the invention, one-dimensional versions of morphological operations are used, since field dipoles lie along one-dimensional boundaries. In a preferred embodiment, a morphological dilation operation is used. making the comparison modulo 180xc2x0, in which case gradient polarity is effectively ignored, or making it modulo 360xc2x0, in which case gradient polarity is considered. In a preferred embodiment, the field itself indicates at each point within the stored model pattern whether to ignore polarity, consider polarity, or defer the decision to a global parameter.
In one embodiment, the force distance is used to determine the dipole weight. In a preferred embodiment, if the force distance is larger than some specified parameter, the dipole is given zero weight, on the assumption that the dipole is too far away to represent valid evidence. If the force distance is smaller than some other specified parameter, the dipole is given a high weight, and if it falls between the two parameters, intermediate weights are assigned.
In a preferred embodiment, the parameters specifying the weight factor as a function of force distance are adjusted for each attraction step to account for the fact that the pose is becoming more accurate, and therefore that one should expect image dipoles representing valid evidence to be closer to the pattern boundaries.
In one embodiment, the gradient magnitude of the image dipoles is used to determine the dipole weight. In a preferred embodiment, a combination of dipole direction, force distance, and gradient magnitude is used to determine the weight.
For each attraction step, the invention determines a new pose that best accounts for the evidence contributed by each image dipole, and taking into account the dipole""s weight. In a preferred embodiment, a least-squares method is used to determine the new pose.
The evaluation of each image dipole to produce a weight can also provide information for inspection purposes. It is desirable to look for two distinct kinds of errors: missing features, If the starting pose is too far away from the true pose, there may be insufficient good evidence from the image dipoles to move the pose in the right direction. The set of starting poses that result in attraction to the true pose defines the capture range of the pattern. The capture range depends on the specific pattern in use, and determines the accuracy needed from whatever method is used to determine the starting pose.
In a preferred embodiment, the feature detector that is used to generate dipoles is tunable over a wide range of spatial frequencies. If the feature detector is set to detect very fine features at a relatively high resolution, the accuracy will be high but the capture range will be relatively small. If on the other hand the feature detector is set to detect coarse features at a relatively lower resolution, the accuracy will be lower but the capture range will be relatively large. This suggests a multi-resolution method where a coarse, low resolution step is followed by a fine, high resolution step. With this method, the capture range is determined by the coarse step and is relatively large, while the accuracy is determined by the fine step and is high.