The present invention relates to model-based object recognition and, more particularly, to a method for object recognition based on transformation hashing and chamfer matching.
In applications such as automatic vision, it often is necessary to identify an object in a digital image of a scene. For example, a robot on an assembly line may need to identify a portion of a workpiece that the robot needs to work on. Typically, the image consists of a rectangular array of image pixels, each pixel having a certain gray level. In model-based object recognition, instead of working with the gray levels, feature pixels that correspond to corners and edges of objects in the scene are identified in the image, using conventional edge detection and corner detection algorithms. See, for example, J. Canny, "A computational approach to edge detection", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8 no. 6 pp. 679-698 (November 1986); H. Moravec, "Towards automatic visual obstacle avoidance", 5th International Joint Conference on Artificial Intelligence p. 584 (1977); and R. M. Harelick and L. G. Shapiro, Computer and Robot Vision (Addison-Wesley, 1992), vol. 1 sec. 8.10 pp. 410-419, "Corner detection" and vol. 2 sec. 16.4 pp. 332-348, "An interest operator". The object whose presence is suspected in the image, or whose location in the image is sought, is represented as a model that includes several model points, called "feature points", that correspond to corners or edges of the sought object. A subset of the image feature pixels is sought that corresponds in some sense to the model feature points. If such a subset is found, that subset is presumed to be part of an outline of the sought object.
The prior art method of model-based object recognition that is closest to the present invention is that described in Yehezkel Lamdan and Haim J. Wolfson, "Geometric hashing: a general and efficient model-based recognition scheme", Second International Conference on Computer Vision, IEEE Computer Society, 1988, pp. 238-249. A better, more specific name for this prior art method is "basis set hashing". According to this method, the sought object is represented as a two-dimensional model consisting of a set of feature points. The method is most simply explained with reference to two-dimensional similarity transformations (rotations, translations and scaling). Pairs of feature points are considered in turn. For each pair of feature points, a coordinate system is formed in which one of the feature points of the pair has coordinates (0,0), and the other feature point of the pair has coordinates (1,0), so that the two points of the pair define one unit vector of a basis set of the coordinate system. The coordinates of the feature points collectively in this coordinate system constitute a representation of the model, specifically, the result of the similarity transformation of the feature point coordinates that transforms the first point of the pair to (0,0) and the second point of the pair to (1,0). Similar representations of the image are formed, using pairs of feature pixels.
The remainder of the method of Lamdan and Wolfson consists of looking for one or more image representations that include one of the model representations, to within the level of discretization represented by the pixels of the image. With m points in the model and n pixels in the set of feature pixels, if matching a model representation to an image representation has complexity t, then the complexity of brute force matching is m.sup.2 n.sup.2 t, which is of the order n.sup.5 in the worst case. Therefore, Lamdan and Wolfson construct a hash table. Each entry of the hash table corresponds to an image pixel that includes one or more of the points of the model representations, and is a list of all the model representations having points that fall within that image pixel. Matching is done by assigning tallies to the model representations. All the tallies initially are 0. Pairs of feature points are selected as a potential base pair and all other feature points are transformed using this base. For each feature pixel with an entry in the hash table, 1 is added to the tally of each model representation listed in that entry. The model representation with the highest tally identifies the location of the object in the image. If no tallies exceed a predetermined threshold, then it is concluded that the object does not appear in the image. The fact that the hash table can be constructed in advance makes this method suitable, in principle, to real-time object recognition.
The above description of the prior art method of Lamdan and Wolfson is based on two-dimensional similarity transformations. The method also could be based on more complicated transformations, for example, on three-dimensional similarity transformations of three-dimensional models, or on affine transformations. The coordinate systems of the representations then are of higher dimensionality, for example, dimension 3 for three-dimensional similarity transformations and two-dimensional affine transformations. The run-time complexity for object recognition is m.sup.k.sup..sup.+1 for k-dimensional representation coordinate systems.