Automated systems for recognizing objects are increasingly being used in a large variety of technical fields, for example, biomedicine, cartography, metallurgy, industrial automation, and robotics. Moreover, many businesses are investing huge amounts of money and capital on the research and development of machine vision systems and related data processing systems which can automatically and accurately identify objects, often referred to as "target objects."
Automated object recognition systems are becoming more and more sophisticated as data processing techniques advance in sophistication. Most of the practical recognition systems in the conventional art employ methods involving the derivation of models to be used for recognizing objects in an image scene. An image scene in the context of this document is a two-dimensional (2D) representation, which for instance, could be derived from appearance data retrieved from a three-dimensional (3D) object at different viewpoints.
Furthermore, in most systems, data for deriving the models is inputted manually. However, in a few high-end automated systems, data can be learned via some sort of image capturing device, for example, a camera or scanner.
The model-based techniques have been conceptualized as having two phases: an object acquisition phase and a subsequent object recognition phase. More specifically, models of target objects are initially precompiled and stored during the acquisition phase, independently of the image scene. Then, the occurrences of these objects within an image scene are determined during the recognition phase by comparison of the sampled data to the stored models.
The task of recognizing objects in a scene is often complicated by rotation (vantage point), translation (placement), or scaling (size) of the object in a scene. In addition, the task may further be complicated by the partial concealment, or "occlusion", of a target object possibly caused by overlaps from other objects or some other adverse condition.
Some recognition systems employ "parametric" techniques, or mathematical parameter transforms. In parametric techniques, the spatial representation of an image in orthogonal coordinates is transformed into a representation based upon another coordinate system. Analysis of the image then takes place based upon the latter coordinate system. The methodology of using parametric techniques originated in U.S. Pat. No. 3,069,654 to Hough, involving the study of subatomic particles passing through a viewing field.
Many of the commonly used parametric techniques are interrelated and have evolved over years of experimentation since the Hough patent. For a general discussion in regard to the use of parametric techniques for shape identification, see D. H. Ballard, "Parameter nets: A theory of low level vision," Proceedings of the 7th International Joint Conference on Artificial Intelligence, pp. 1068-1078, August 1981.
Well known conventional parametric techniques include "alignment techniques", "Hough Transform techniques", and "geometric hashing". Although not particularly relevant to the present invention, alignment techniques are discussed in the following articles: D. P. Huttenlocher, S. Ullman, "Three-Dimensional Model Matching from an Unconstrained Viewpoint," Proceedings of the 1'st International Conference on Computer Vision, pp. 102-111, London, 1987, and D. P. Huttenlocher, S. Ullman, "Recognizing Solid Objects by Alignment," Proceedings of the DARPA Image Understanding Workshop, vol. II, pp. 1114-1122, Cambridge, Massachusetts, April 1988. In regard to Hough Transform techniques, which also are not particularly relevant to the present invention, see D. H. Ballard, "Generalizing the Hough Transform to Detect Arbitrary Shapes, "Pattern Recognition, vol. 13(2), pp. 111-122, 1981.
Geometric hashing is an often favored technique and is considered proper background for the present invention. In geometric hashing, models of objects are represented by "interest points." An orthogonal coordinate system is defined based on an ordered pair of interest points, sometimes referred to as the "basis pair." For example, the first and second interest points could be identified respectively as ordered pairs (0,0) and (1,0). Next, all other interest points are represented by their coordinates in the coordinate system.
The foregoing representation allows for comparison of objects which have been rotated, translated, or scaled, to the interest points of the model. Furthermore, the representation permits reliable comparison of the model to occluded objects, because the point coordinates of the occluded object in the sampled scene have a partial overlap with the coordinates of the stored model, provided both the model and scene are represented in a coordinate system derived from the same basis pair. However, occlusion of one or more of the basis points will preclude recognition of the object.
To avoid such a condition, interest points are represented in all possible orthogonal coordinate systems which can be derived from all of the possible basis pairs of interest points. Each coordinate is used to identify an entry to a "hash table." In the hash table, a "record" is stored which comprises the particular basis pair along with an identification of the particular model at issue.
During the object recognition phase, interests points initially are extracted from a scene. An arbitrary ordered pair is selected and used as the first basis pair. The coordinates of all other interests points in the scene are computed utilizing this basis pair. Each computed coordinate is compared to the coordinate entries of the hash table. If a computed coordinate and respective record (model, basis pair) appears in the hash table, then a "vote" is accorded the model and the basis pair as corresponding to the ones in the scene. The votes are accumulated in a "bucket." When a certain record (model basis pair) gets a large number of votes, then the record, and corresponding model, is adopted for further analysis.
Using the record, the edges of the specified model are compared against the edges in the scene. If the edges correspond, then the object is considered matched to the model specified in the adopted record. If the edges do not match, then the current basis pair is discarded and a new basis pair is considered.
For further discussions in regard to geometric hashing, consider: Y. Lamdan, H. J. Wolfson, "Geometric hashing: a general and efficient model-based recognition scheme," Proceedings of the 2nd International Conference on Computer Vision, December 1988; J. Hong, H. J. Wolfson, "An Improved Model-Based Matching Method using Footprints," Proceedings of the International Conference on Pattern Recognition, Rome, Italy, November 1988; Y. Lamdan, J. T. Schwartz, H. J. Wolfson, "On recognition of 3-D objects from 2-D images," Proceedings of the IEEE International Conference on Robotics and Automation, vol. 3, pp.1, 1407-1413, Philadelphia, April 1988; and, finally, A. Kalvin, E. Schonberg, J. T. Schwartz, M. Sharir, "Two Dimensional Model Based Boundary Matching Using Footprints," The International Journal of Robotics Research, vol. 5(4), pp. 38-55, 1986.
Although to some degree effective, conventional parametric techniques, and specifically, conventional geometric hashing for the identification of complex visual shapes remains burdensome and undesirably time consuming in many circumstances. Moreover, geometric hashing is often unreliable because the performance of geometric hashing degrades significantly with only very limited amounts of clutter or perturbation in the sampled data. Geometric hashing can also be unreliable due to extreme sensitivity to quantization parameters. Finally, geometric hashing has limited index/model selectivity, oftentimes improperly accumulates excessive votes in each vote bucket, and has a limited number of useful buckets available in the hash tables.