Techniques of visual object (and/or pattern) recognition are increasingly important in automated manufacturing, biomedical engineering, cartography and many other fields. Model-based recognition techniques typically must solve the problem of finding, in an image acquired by a camera, an occurrence of a previously defined model that has been affected by affine transformation. Affine transformations are those in which straight lines remain straight and parallelism is preserved. Furthermore, in affine transformations, angles of an object in relation to a coordinate system may undergo changes and differential scale changes may be introduced.
Geometric hashing, as described in “Geometric hashing: A generalized and Efficient Model-Based Recognition Scheme” (Y. Lamdan and H. J. Wolfson, Second International Conference on Computer Vision, December 1988, pp 238-249), and “Affine Invariant Model-Based Object Recognition” (Y. Lamdan, J. T. Schwartz, H. J. Wolfson, IEEE Transactions on Robotics and Automation, Vol. 6, No. 5, October 1990) has been proposed as a method of finding occurrences between an image and a model with affine transformation and partial occlusion.
In known geometric hashing methods, models of objects are represented by geometric primitives. Geometric primitives of the models are referred to herein as model primitives. The geometric primitives can be either object geometric features, such as ends of lines, corners, edge points, etc., or sets of these geometric features, such as line segments, lines, etc. For each triplet, couple or single representation of geometric primitives (according to the type of primitive chosen), a respective coordinate system is defined using the involved geometric primitives as a basis. For example, in the method defined by Lamdan and Wolfson, geometric primitives are interest points. Each triplet of interest points defines a basis. The location of each of the other interest points can then be calculated within the respective coordinate system, to produce a representation of the interest points that is affine invariant. For each coordinate system (basis), the calculated coordinates of each interest point is then used as an index to reference a corresponding bin of a hash table, into which a reference to the model and basis (e.g., a record in the form of [Model-ID, Basis-ID]) is inserted. The fully populated hash table is intended to provide a representation of the model that is invariant to affine transformations, and contains sufficient information to enable a match to be made, even when an object is partially occluded.
As is well known in the art, object recognition commences by acquiring an image of the object (e.g., using a gray-scale digital camera), and processing the image to detect geometric features. As with the model, a set of these features is used as a basis for a respective coordinate system, within which the locations of each of other geometric features are calculated. These calculated coordinates are used to access corresponding bins of the hash table. If an accessed bin contains a record (e.g. in the form of [Model-ID, Basis-ID]), then that record is accorded a vote or more generally a proximity score is computed. The records that accumulate the largest significant number of votes or the highest score are adopted as candidates, and extracted for further analysis. The hypothesis is that the model referenced by the record with the highest number of votes or the highest score most closely corresponds to the target image, and the proper transformation of that model into the target image can be computed from the basis identified in that record.
This object recognition algorithm and essentially its improved versions present some important advantages regarding object recognition (See “A probabilistic Approach to Geometric Hashing using Line Features”, Frank Chee-Da Tsai, Technical Report No. 640, Robotics Research Laboratory, Courant Institute of Mathematical Sciences, June 1993, or patent application US2002/0181780 A1 dated Dec. 5, 2002, which is hereby incorporated by reference). It is an algorithm robust to noise and occlusion. However in such algorithms an enormous amount of processing is performed by the CPU. This effectively slows down the recognition step. Accordingly, an implementation that enables faster and reliable recognition of objects remains highly desirable.