Machine vision systems are frequently used in contexts that require the system to capture a two dimensional image of a physical object and locate within that image some aspects or features that are to be analyzed, such as to determine position of the features to effect alignment or a particular orientation.
Known systems employ what may be referred to as "feature based" methods for locating features within a captured image of the object. The physical object or entity being manipulated or inspected, e.g. a circuit board or component, is manipulated or inspected in "physical space", which is an absolute coordinate frame within which the object actually resides. A typical application for such systems employing feature based methods is to determine a current position and/or effect alignment of the physical object in physical space based on a model of the object.
Implementation of feature based methods typically requires as an input to the vision system a set of model features acquired from the model (i.e. standard of the physical object), in accordance with an arbitrary coordinate system or "model space". An image of the physical entity is acquired by the vision system in accordance with an "image space" coordinate system, in order to input to the vision system a set of image features (analogous to physical features). The image is typically input as point information relative to the image space coordinate system. Knowing the two dimensional linear mapping between physical space and image space, the set of model features and the set of image features can be processed according to known algorithms to determine "correspondence", i.e. a direct mapping of respective two dimensional spatial relationships, between model features and image features. Correspondence between model and image features, or equivalently, a two dimensional rigid transform that maps model features to their corresponding features in image space can then be used, knowing the two dimensional linear mapping between physical space and image space, to determine the spatial relationship between model space and physical space in order to position/align features of the object in physical space in accordance with model features in model space.
Known algorithms or approaches for searching for correspondence between model features and image features include: an interpretation tree approach; a minimal basis set approach; and a grouping approach, all of which are generally described in OBJECT RECOGNITION BY COMPUTER: THE ROLE OF GEOMETRIC CONSTRAINTS, Grimson, W. E. L., The MIT Press, 1990, which is incorporated herein by reference.
In the interpretation tree approach, all possible correspondences between model and image features are tested. The interpretation tree approach can be illustrated by letting "M" stand for a model feature, "I" stand for an image feature, "m" be the number of model features and "i" be the number of image features. Assuming that every model feature appears somewhere in the image, all model features are ordered from 1 to m, and then matched to all sequences of the form I.sub.1 I.sub.2 . . . m. For the first position in the(sequence there are i possible choices of image features, for the second there are (i-1), and so on. Therefore, the number of possible sequences to test is on the order of i.sup.m, that is, the number of possible combinations of model features and image features that must be checked for correspondence is exponential.
Although impossible combinations of matches can be identified and pruned out in effecting the interpretation tree approach, the search space is still exponentially large. A clear disadvantage of this approach is that it is very time consuming to process all potential correspondences if either i or m is greater than a very small number of features. Thus, the amount of processing overhead required to effect this approach is likely to be quite significant. Furthermore, large amounts of memory may be required if it is necessary to store all possible combinations of model features and image features that must be checked for correspondence.
In the minimal basis set approach there is no need to check all possible combinations of model features for correspondence with image features in an image. This approach takes advantage of the fact that a correspondence of only two model to two image features is required to solve for a two dimensional (2D) rigid transform defining the relationship between model space and image space. Therefore, two model features, M.sub.1 and M.sub.2, are matched against every image feature subset comprised of two features. When two image features are found to correspond to the model features, the correspondence is used to solve for a 2D rigid transform which directly maps image space to model space. The resultant transform is then used to project any remaining model points into the image. Projecting a model point into an image means solving for a hypothesized location of a feature within an image (in image space) based on the 2D transform mapping model space to image space. An image feature is searched which corresponds to each projected model point. Image features are found to correspond to model points if they are close in terms of their respective locations, i.e. if the image feature and model feature locations match. The entire set of image to model feature correspondences is used for the final fit between image and model features.
Disadvantageously, the minimal basis set approach is highly sensitive to noise in the image. Any noise in the location of the image features used in the original correspondence may lead to erroneous correspondence of the remaining model features. Significant levels of noise, such as might be introduced into a vision system by lighting problems, optics problems, etc., may result in a correct full correspondence being missed completely.
The grouping technique for searching for correspondence between model features and image features involves using a small, selected subset of image features. The image features selected for determining correspondence with model features are ones that are most likely to belong to a subset of a set of features comprising the model. For example, a vision system might be implemented in an application wherein it is desirable to find a particular rectangular object in the scene or image, e.g. a microchip. Although the captured image contains many objects, grouping could be applied by having the vision system select all groups of line segments from the image, and more particularly all groups of line segments that are at right angles to each other. The subsets or smaller groups of features, e.g. line segments at right angles to each other, would then be used to match to the chip model, rather than using the entire set of line segments.
Although the concept of grouping has potential to reduce the overall search space and overhead associated with achieving correspondence between model features and image features, practical implementations embodied in vision systems known in the art suffer significant drawbacks. A known system implementing grouping is referred to as GROPER and is described in detail in THE USE OF GROUPING IN VISUAL OBJECT RECOGNITION, Jacobs, D. W. (Doctoral Thesis, Massachusetts Institute of Technology, 1988). The grouping approach implemented in GROPER effects object recognition by grouping image characteristics which are believed likely to match a subset of model (candidate) characteristics. GROPER implements a grouping process that searches a captured, run-time image for subsets of features to match against groups of model candidate characteristics indexed in a precomputed lookup table. The image characteristics grouped and the model candidate characteristics stored in the lookup table in the GROPER system are edges. The lookup table is built using multiple candidate models and contains all pairs or subsets of edges from all models. The image characteristics searched for, which are believed likely to match a subset of model (candidate) characteristics, are used for looking up all possible matching model subsets. That is, at runtime, the GROPER object recognition system seeks to identify in an image a subset of edges corresponding to a subset in a single model, and if some subset of features is identified that may match a subset of model features, it is processed against the lookup table to determine a possible match.
The grouping implementation(s) known in the art are not suitable for implementation in many applications. Such grouping implementations present difficulties in any context where there are repetitive patterns that create a large number of potential matches, in that a lookup table can disadvantageously require a large number of entries. Creating and traversing the lookup table can represent a significant amount of processing. Lookup table traversal may yield a large number of false matches. In systems such as GROPER, where a large number of candidate model characteristics must be considered for a potential match, verification of the match can be very slow.