1. Field of the Invention
The present invention relates to an object recognition apparatus which recognizes an object picture existing in an input image as an object indicated by a typical image when the object picture matches with the typical image.
2. Description of Related Art
Various techniques for recognizing a picture of an input image matching with a registered typical image have been proposed. For example, keypoints are extracted from an input image, an image feature at each keypoint is calculated, and each calculated feature is compared with image features at keypoints extracted from a typical image. In response to this comparison, it is judged whether an object matching with the typical image exists in the input image.
This method of comparing features of keypoints in an input image with features of keypoints in a typical image has been disclosed in both a patent document (Published Japanese Patent First Publication No. 2006-65399) and a non-patent document (“Distinctive Image Features from Scale-Invariant Keypoints” written by David G. Lowe, International Journal of Computer Vision, 2004). More specifically, features are set to be invariant to image scaling (i.e., image enlargement and reduction) and rotation. Therefore, even when a size or position in rotation of an object picture existing in an input image differs from a typical image of an object, the object picture can be recognized as the object.
In this comparing method, image smoothing using the Gaussian function is performed for an input image. More specifically, a plurality of smoothed images corresponding to respective scales of the Gaussian function are calculated from an input image. A DoG (difference-of-Gaussian) filter is applied to the smoothed images corresponding to the different scales to obtain a plurality of DoG images, and extremal values are detected from the DoG images. A point (i.e., pixel) of each extremal value is set as a candidate for a keypoint (herein after, called keypoint candidate). The scale of each DoG image having at least one extremal value is used later to calculate a feature at the point of the extremal value. In the same manner, the input image is reduced or minified at each of the reduction ratios to obtain reduced images, other DoG images are calculated from each of the reduced images, and other keypoint candidates of the input image are detected from the other DoG images.
In this detection of the keypoint candidates, there is a probability that some of the keypoint candidates cause an opening problem. To solve this problem, keypoints having lower contrasts and keypoints located on edges are removed from the keypoint candidates to extract stable keypoints from the input image.
Thereafter, an image feature is calculated for each extracted keypoint. The image feature of each keypoint contains a feature element invariant to image scaling, scale information required for the calculation of the scale invariant feature, and information (i.e., rotation information) indicating a rotation of an image within a predetermined area around the keypoint. The predetermined area is determined according to the scale information. As described in detail in the documents, the scale invariant feature is invariant to image scaling (i.e., image enlargement and reduction) and rotation. Therefore, even when an object picture matching with a typical image exists in an input image at any size or rotational position, the object picture can be recognized as the object.
In the matching operation, a scale invariant feature at each keypoint of the typical image is compared with scale invariant features of all keypoints in the input image. When features of some keypoints in the input image are the same as or similar to respective features of keypoints in the typical image, it can be judged that an object picture matching with the typical image exists in the input image.
For example, the number of keypoints existing in a typical image is equal to 100, and keypoints having the same or similar features as or to respective features of the keypoints of the typical image are extracted from an input image. When the number of keypoints extracted from the input image is equal to 90 or more, a picture of an object matching with the typical image exists in the input image at a high probability. Therefore, the object picture can be recognized as the object. In contrast, when the number of keypoints extracted from the input image is equal to 10 or less, a picture of an object matching with the typical image exists in the input image at a low probability. Therefore, no object indicated by the typical image is recognized.
However, even when a picture of an object matching with a typical image actually exists in an input image, it is sometimes difficult to sufficiently extract keypoints of the object from the input image. In this case, because the number of keypoints of the object extracted from the input image becomes small, it is sometimes misjudged that no object matching with the typical image exists in the input image.
For example, in case of the occurrence of occlusion, picture of a remarked object matching with a typical image is hidden behind a picture of another object in an inputted image, so that the remarked object is partially shown in the input image. In this case, although the object picture matching with the typical image exists in the input image, keypoints of the remarked object cannot sufficiently be extracted from the input image. As a result, the number of keypoints of the remarked object extracted from the input image becomes small.
Further, when an object picture is shown at an extremely small size in an input image, keypoints of an object cannot be sufficiently extracted from the input image. That is, when the object picture shown in an input image has almost the same size as that of a typical image of the object, the number of keypoints of the object extracted from the input image becomes almost equal to the number of keypoints of the typical image. In contrast, when the object picture is extremely small in size in comparison with the typical image, resolution in the object picture is very low. Therefore, the number of keypoints of the object extracted from the input image becomes very small.
As described above, in the object recognition, using the scale invariant features of the keypoints, an object picture matching with a typical image in an input image is preferably recognized as an object having the typical image, regardless of image scaling (i.e., image enlargement and reduction) or rotation. However, when an object picture matching with a typical image is set in a specific condition (occlusion, extremely small size or the like) in an input image, a small number of keypoints having the same or similar features as or to respective features of keypoints of the typical image are extracted from the input image. Therefore, it is sometimes misjudged that no object picture matching with the typical image exists in the input image.