Techniques of visual object (and/or pattern) recognition are increasingly important in automated manufacturing, biomedical engineering, cartography and many other fields. Model-based recognition techniques typically must solve the problem of finding, in an image acquired by a camera, an occurrence of a previously defined model that has been affected by affine transformation. Affine transformations may be defined as transformations in which straight lines remain straight and parallelism is preserved. Angles however, may undergo changes and differential scale changes may be introduced.
Geometric hashing, as described in “Geometric hashing: A generalized and Efficient Model-based Recognition Scheme” (Y. Lamdan and H. J. Wolfson, Second International Conference on Computer Vision, December 1988, pp 238–249), and “Affine Invariant Model-Based Object Recognition” (Y. Lamdan, J. T. Schwartz, H. J. Wolfson, IEEE Transactions on Robotics and Automation, Vol. 6, No. 5, October 1990) has been proposed as a method of finding occurrences between an image and a model with affine transformation and partial occlusion.
In known geometric hashing methods, models of objects are represented by interest points. These interest points are typically edge coordinates that correspond to important features (such as the ends of lines, corners, etc) of an object. For each triplet of interest points, a respective coordinate system is defined using the involved triplet as a basis. The location of each of the other interest points can then be calculated within the respective coordinate system, to produce a representation of the interest points that are affine invariant. For each coordinate system (basis), the calculated coordinates of each interest point is then used as an index to reference a corresponding bin of a hash table, into which a reference to the model and basis (e.g. a record in the form of [Model-ID, Basis-ID]) is inserted. The fully populated hash table is intended to provide a representation of the model that is invariant to affine transformation, and contains sufficient information to enable a match to be made, even when an object is partially occluded.
As is well known in the art, object recognition commences by acquiring an image of the object (e.g., using a gray-scale digital camera), and processing the image to detect points of interest. As with the model, each triplet of interest points is used as a basis for a respective coordinate system, within which the locations of each of other interest points are calculated. These calculated coordinates are used to access corresponding bins of the hash table. If an accessed bin contains a record (e.g. in the form of [Model-ID, Basis-ID]), then that record is accorded a vote. The records that accumulate the largest significant number of votes are adopted as candidates, and extracted for further analysis. The hypothesis is that the model referenced by the record with the highest number of votes most closely corresponds to the target image, and the proper transformation of that model into the target image can be computed from the basis identified in that record.
According to Lamdan and Wolfson (“Geometric hashing: A generalized and Efficient Model-based Recognition Scheme”, supra), this geometric hashing technique can deal with partially occluded objects. However, in practice, geometric hashing often fails in cases where too many important features (e.g. corners, large edge features etc.) of a target object are occluded. This is because image detection and analysis generally yield a relatively small number of interest points that pertain to the object in question. Thus if too great a proportion of important features of the target object are occluded, the number of interest points detected for that object may be too low to permit the correct record to accumulate a significant number of votes.
In addition, noise in an acquired image can produce errors in the computation of the coordinates of interest points, which may result in incorrect coordinate values being used to access the hash table. The problem of imprecision and computation errors can affect both points which define bases, and interest points that are used to vote. Since interest point coordinate values are a function of the chosen basis, errors due to imprecision in respective basis point and interest point locations are accumulated. The problem here is that imprecision leads to too many false candidates being selected. In a real image, which normally contains both partial occlusions and noise, many “false” interest points are frequently found. Under some circumstances, these “false” interest points can cause a record to incorrectly accumulate a large number of votes. These problems are significant disadvantages of conventional geometric hashing, and are discussed in “On the Error Analysis of Geometric Hashing”, (Lamdan, H. J. Wolfson, Proceedings IEEE Conference, Computer Vision and Pattern Recognition, pages 22–27, 1991) and “On the Sensitivity of Geometric Hashing” (W. E. Grimson, D. P. Huttenlocher, Technical Report A. I. Memo 1250, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 1990).
To avoid the above drawbacks, some improvements over traditional geometric hashing have been proposed. In particular, instead of interest points, the use of lines as affine-invariant features to represent an object has been suggested (See “A probabilistic Approach to Geometric Hashing using Line Features”, Frank Chee-Da Tsai, Technical Report No. 640, Robotics Research Laboratory, Courant Institute of Mathematical Sciences, June 1993). In this technique, a line is represented as a vector (r, θ), where r represents an orthogonal distance of the line from the origin of a selected coordinate system, and θ represents the angular orientation of the line in the coordinate system. This vector representation may also be extended to include the length of the line. According to Tsai, lines can be used as the basis of respective coordinate systems, and geometric hashing performed in a manner directly analogous to that used for interest points. The use of lines generally provides a more robust representation of an object, because imprecision in the location of detected points (e.g. due to noise) do not affect the location of a line as severely as they do when calculating coordinates of discrete points.
However, this technique can still result in records incorrectly accumulating a large number of votes. This is at least partially due to the fact that the vector representation provides information of the radial distance between the origin of a selected basis and an infinite-length line, and the angular orientation of the infinite length line relative to that basis. Even in cases where the vector notation is extended to include the length of a line-segment lying on the line, no information is provided about the actual position, along the infinite length line, of the line segment. While the approach of Tsai may yield improved recognition, in practice, it is still unable to reliably detect objects in respect of which a significant proportion of important features are occluded.
Accordingly, a robust geometric hashing method that enables rapid and reliable recognition of heavily occluded objects, remains highly desirable.