Object recognition is part of many computer vision applications. It is particularly useful for industrial inspection tasks, where often an image of an object must be aligned with a model of the object. The transformation (pose) obtained by the object recognition process can be used for various tasks, e.g., pick and place operations, quality control, or inspection tasks. In most cases, the model of the object is generated from an image of the object. Such pure 2D approaches are frequently used, because it usually is too costly or time consuming to create a more complicated model, e.g., a 3D CAD model. Therefore, in industrial inspection tasks one is typically interested in matching a 2D model of an object to the image. A survey of matching approaches is given in R7 (see attached Reference list). The simplest class of object recognition methods is based on the gray values of the model and the image (R7, R16). A more complex class of object recognition uses the object's edges for matching, e.g., the mean edge distance (R6), the Hausdorff distance (R20), or the generalized Hough transform (GHT) (R4).
All of the above approaches do not simultaneously meet the high industrial demands: robustness to occlusions, clutter, arbitrary illumination changes, and sensor noise as well as high recognition accuracy and real-time computation. The similarity measure presented in R21, which uses the edge direction as feature, and a modification of the GHT (R24), which eliminates the disadvantages of slow computation, large memory amounts, and the limited accuracy of the GHT, fulfill the industrial demands. Extensive performance evaluations (R25), which also include a comparison to standard recognition methods, showed that these two approaches have considerable advantages.
All of the above mentioned recognition methods have in common that they require some form of a rigid model representing the object to be found. However, in several applications the assumption of a rigid model is not fulfilled. Elastic or flexible matching approaches (R3, R13, R5) are able to match deformable objects, which appear in medicine when dealing with magnetic resonance imaging or computer tomography, for example. Approaches for recognizing articulated objects are also available especially in the field of robotics (R11).
Indeed, for industrial applications like quality control or inspection tasks it is less important to find elastic or articulated objects, but to find objects that consist of several parts that show arbitrary mutual movement, i.e., variations in distance, orientation, and scale. These variations potentially occur whenever a process is split into several single procedures that are—by intention or not—insufficiently “aligned” to each other, e.g., when applying a tampon print using several stamps or when equipping a circuit board with transistors or soldering points. In FIG. 1 an example object is shown. FIG. 3 illustrates the mutual movements (variations) of the object parts. Clearly, when taking such kind of objects as rigid it may not be found by conventional object recognition approaches. However, when trying to find the individual parts separately the search becomes computationally expensive since each part must be searched for in the entire image and the relations between the parts are not taken into account. This problem can hardly be solved taking articulated objects into account since there is no true justification for hinges, but the mutual variations can be more general. Because the object may, for example, consist of several rigid parts, obviously, also elastic objects cannot model these movements. One possible solution is to generate several models, where each model represents one configuration of the model parts and to match all of these models to the image. However, for large variations, this is very inefficient and not practical for real-time computation. In U.S. Pat. No. 6,324,299 (R1) a method of locating objects is described where the object has a plurality of portions. In a first step the coarse pose of the object is determined and in a subsequent step the fine poses of the object portions are calculated. Therefore, the variations of the portions must be small enough to find the coarse pose of the object, in contrast to the present invention, where the variations are explicitly modeled and may be of arbitrary form and of arbitrary size. Additionally, in U.S. Pat. No. 6,324,299 the variations are not automatically learned in a training phase as it is done in the present invention. U.S. Pat. No. 6,411,734 (R2) extends the method presented in U.S. Pat. No. 6,324,299 by checking the found object portions whether they fail to satisfy user-specified requirements like limits on the poses of the object portions. The advantage of the present invention is that this check can be omitted since the object parts are only searched over the range of valid poses and therefore, only valid instances are returned.