For example, many object recognition technologies in practical use for enabling a robot to recognize an object employs a template matching technique using a sequential similarity detection algorithm or a cross-correlation coefficient. The template matching technique is effective in a special case that permit an assumption that an object to be detected appears without deformation in an input image, but not effective in an object recognition environment of recognizing a common image with unstable viewpoint or illumination state.
On the other hand, a shape matching technique has also been proposed of matching a shape feature of the object against a shape feature of each of areas of the input image, the areas being cut out from the input image by an image dividing technique. In the aforementioned common object recognition environment, however, a result of area division will not be stable, resulting in difficulty in excellently describing the shape of an object in the input image. In particular, recognition becomes very difficult when the object to be detected is partially hidden behind another object.
Besides the above matching techniques that use an overall feature of the whole or partial areas of the input image, a technique has also been proposed of extracting characteristic points or edges from an image, expressing relative spatial positions of a collection of line segments or a collection of edges formed thereby in the form of a line diagram or a graph, and performing matching based on structural similarity between such line diagrams or graphs. Such a technique works well for a particular specialized object, but sometimes fails to extract a stable inter-feature point structure due to image deformation, resulting in difficulty in recognizing the aforementioned partially-hidden object, in particular.
As such, there has been proposed a matching technique of extracting characteristic points (i.e., feature points) from an image and using feature amounts obtained from image information of the feature points and local neighborhoods thereof. In this matching technique that uses local feature amounts of the feature points which remain unchanged regardless of partial image deformation, more stable detection is achieved than by the above-described techniques even when image deformation occurs or the object to be detected is partially hidden. Examples of already proposed methods for extracting feature points that remain unchanged regardless of scale transformation include: a method of constructing a scale space of an image, and extracting, from local maximum points and local minimum points of a “Difference of Gaussian (DoG) filter output” of the image at each scale, a point whose position is not changed by a change in a scale direction as a scale feature point (Non-Patent Document 1 or Non-Patent Document 2); and a method of constructing the scale space of an image, and extracting, from corner points extracted by a Harris corner detector from the image at each scale, a point that gives a local maximum of a “Laplacian of Gaussian (LoG) filter output” of a scale space image as the feature point (Non-Patent Document 3).
Moreover, it is preferable that, in the feature points extracted in the above-described manner, a feature amount invariant to a line-of-sight change be selected. For example, Schmid & Mohr has proposed a matching technique of determining a corner detected by means of the Harris corner detector to be the feature point, and using a rotation-invariant feature amount of a neighborhood of the feature point for matching (Non-Patent Document 4).
[Non-Patent Document 1]    D. Lowe, “Object recognition from local scale-invariant features, in Proc. International Conference on Computer Vision, Vol. 2, pp. 1150-1157, Sep. 20-25, 1999, Corfu, Greece.
[Non-Patent Document 2]    D. Lowe, “Distinctive image features from scale-invariant keypoints, accepted for publication in the International Journal of Computer Vision, 2004. K. Mikolajczyk, C. Schmid, Indexing based on scale invariant interest points, International Conference on Computer Vision, 525-531, July 2001.
[Non-Patent Document 3]    K. Mikolajczyk, C. Schmid, “Indexing based on scale invariant interest points, International Conference on Computer Vision, 525-531, July 2001. Schmid, C., and R. Mohr, Local grayvalue invariants for image retrieval, IEEE PAMI, 19, 5, 1997, pp. 530-534.
[Non-Patent Document 4]    Schmid, C., and R. Mohr, “Local grayvalue invariants for image retrieval, IEEE PAMI, 19, 5, 1997, pp. 530-534.