Machine vision systems, also termed “vision systems” herein, are used to perform a variety of tasks in a manufacturing environment. In general, a vision system consists of one or more cameras with an image sensor (or “imager”) that acquires grayscale, color and/or three-dimensional (3D) range/depth/height image data of a scene that contains an object under manufacture. Images of the object can be analyzed to provide data/information to users and associated manufacturing and/or other processes. The data produced by the image is typically analyzed and processed by the vision system in one or more vision system processors that can be purpose-built, or part of one or more software application(s) instantiated within a general purpose computer (e.g. a PC, laptop, tablet or smartphone).
Common vision system tasks include alignment and inspection. In an alignment task, vision system tools, such as the well known PatMax® system commercially available from Cognex Corporation of Natick, Mass., compares features in an image of a scene to a trained (using an actual or synthetic model) pattern, and determines the presence/absence and pose of the pattern in the imaged scene. This information can be used in subsequent inspection (or other) operations to search for defects and/or perform other operations, such as part rejection.
Such vision system tasks can perform alignment by computing a meaningful match score for a given pose of a trained model in a runtime two-dimensional (2D) (e.g. greyscale) image. The score can be used, for example, to determine which candidate poses are true instances of the model, to select the best available candidates, or for other classification and/or decision purposes.
It is often challenging to compute a score and determine alignment when the acquired, runtime image includes deformation or other variations from the expected model image. Such deformation can be the result of minor defects of the object surface, or can result from optical issues—such as variation in viewing angle, uneven illumination partial occlusion of the field of view, shadowing, etc. In order to effectively score a pose, some deformation should be accounted for in the process. Likewise, vision system camera-lens distortion, which can be non-linear can be present, and can be challenging to address and require high processing overhead.
A common technique for aligning a trained model with a runtime image involves the use of probes, which generally comprise a series of locations corresponding to points on the model that define a gradient. Probes are positioned with respect to a candidate pose in the runtime image and are scored to determine the extent to which the candidate pose matches the expected model characteristics defined by the probes.
A prior technique for performing probe matching of candidate poses, in the presence of a deformation, involves extraction of “featurelets” from the runtime image and attempting to match the featurelets. It consists of extracting the sub-pixel position of individual edge transitions from the image. Each corresponds roughly to a section of boundary that is (e.g.) one pixel in length. Once these featurelets are extracted, a neighborhood search around each probe can determine if there is a potential matching feature, and credit is given in the score if a match occurs. However, this approach has two key deficiencies. First, it requires a hard contrast threshold to determine what is and is not a sufficiently strong gradient change to be called a feature. When image features are near to this hard threshold, the behavior of extraction can be unstable. Second, extracting the featurelets is computationally expensive, as it requires determining the gradient at every location, computing the angles of gradient directions (as opposed to just their x and y components), and performing non-peak suppression to leave only the locations with locally strongest gradient. Thus, this approach can be less rapid and efficient than desired.
Another prior technique for matching a trained model is to score a “smoothed” runtime image. It consists of smoothing the image (e.g. by applying a Gaussian filter) and (optionally) downsampling the image. The image is then scored (e.g.) exactly at the transformed probe location (mapped correctly to the downsampled coordinates if appropriate). This implicitly tolerates some local deformation, because nearby features have been blurred such that they overlap the probe location. Again, this approach has two key deficiencies. First, smoothing destroys information and therefore can eliminate a true matching feature, resulting in a score lower than desired. Second, smoothing can creates one or more new locations with strong gradient, and therefore can create false matching features, resulting in a score higher than desired.