The main computational paradigm for object recognition within both 2D intensity and 3D range image processing is based upon feature extraction. Features are properties that can be efficiently and reliably extracted from both a model and an image. Model and/or image features are typically composed into structures for efficient indexing and comparison. Examples of feature extraction are outlined in Interpretation Tree method by W. E. L. Grimson and T. Lozano-Perez in "Model-based object recognition and localization from sparse range or tactile data" (International Journal of Robotics Research, 3(3):3-35, Fall 1984), Geometric Hashing by Y. Lamdan and H. J. Wolfson in "Geometric hashing: A general and efficient model-based recognition scheme" (2nd International Conference on Computer Vision, pages 238-249, December 1988), Bipartite Graph Matching by M. Oshima and Y. Shirai in "Object recognition using three-dimensional information" (IEEE Transaction on Pattern Recognition and Machine Intelligence, PAMI-5(4):353-361, July 1983), and so forth. So attractive is the feature extraction paradigm that even methods which can be applied directly to raw image data such as the Hough transform and Geometric Hashing have been augmented to operate on extracted features.
The underlying assumption in all feature-based methods is that the features themselves can be reliably and efficiently extracted. Feature extraction is a type of recognition process, albeit on a limited and simplified set of elements, and is often problematic. The reliability of the extracted features is often dependent upon extrinsic factors. For example, the boundaries of planes and higher order surfaces extracted from range images have been found to be sensitive to the relative position of the model and the sensor. Feature extraction may also be very computationally intensive, even more so than a subsequent recognition phase. A final limitation is that feature extraction methods are specific to certain feature types, and are restricted in application to objects that contain those features.
The Interpretation Tree method is an alternative to feature extraction that can operate on very sparse data by exploiting the geometric constraints between low level data descriptors. If the image data is dense, then another alternative is to use template matching. In cases where a meaningful template is defined, template matching is an attractive method due to its simplicity and robustness. The straightforward brute-force method of template matching is simply to translate the template to all possible image locations, and compare all template pixels with the overplayed image window. The similarity metric is typically either the sum of absolute differences, or the cross correlation. The main practical limitations of the brute-force approach is its computationally expensive: for m template pixels and n image pixels, matching a single template at all image locations requires m.sup.2 (n-m+1).sup.2 pixel comparisons.
A method for reducing the computational expense of template matching is to order the sequence in which the template and image pixel pairs are compared at a given image location. Pixels with a high-expected difference from a randomly selected image pixel are compared first, increasing the likelihood that an error threshold, which signifies a mismatch, is exceeded prior to comparing all pixel pairs. Another approach to improve efficiency is to organise the matching process hierarchically. A two-stage method is described by Gordon J. Vanderbrug and Azriel Rosenfeld in "Two-stage template matching" (IEEE Transactions on Computers, C-26(4):384-393, April 1977) where the first stage consists of matching a sub-template, which is a subset of the template pixels, at each image location. Only those image locations with a high first stage score are further processed with a full template match in the second stage. Hierarchical structures involving multiple levels of resolution are also known.
Whereas the above methods consider the performance of matching a single template against an image, other methods have been developed to improve the efficiency of matching a set of templates against an image. The best known of these is the Generalized Hough Transform also known as Pose Clustering, which has been known for some time to be equivalent to template matching. Another method, proposed by H. K. Ramapriyan in "A multilevel approach to sequential detection of pictorial features' (IEEE Transactions on Computers, 1:66-78, January 1976), organises a set of templates into a structure called a template tree. The leaves of the tree correspond to distinct templates from the set, and each intermediate node correspond, to a representative template (RT), which is the union of all of its descendant templates. As the tree is traversed, the node whose RF rbest matches the image location is expanded further. Experimental results on synthetic 2D intensity images showed that, for a set of 36 templates, the efficiency improved by a factor of 4 as compared with the brute-force approach.
With some exceptions the described template matching methods have been applied only to two-dimensional imagery. Working with three-dimensional range imagery, Newman et al. applied template matching to an automated industrial inspection process that verified the known position and shape of an object. This is described by Timothy S. Newman, Anil K. Jain, and H. R. Keshavan in an article "3d CAD-based inspection: Coarse verification" (Proceedings of the 11 th International Conference on Pattern Recognition, 1:49-52, August 1992). As the position of the object was constrained, there was no search component to identify the template location in the image. To enhance efficiency, only sub-templates were matched, and experimentation showed that just 1% of model datum points were required for robust results, allowing 5.degree. variation in orientation and 0.25 inches of translation positional inaccuracy. This work was extended to allow some unknown constrained positional parameters, 3 translational and 1 rotational, which, prior to the template matching phase are resolved using a silhouette method.
It would be advantageous to provide an efficient method of recognisinig objects within a three-dimensional range image.