1. Field of the Invention
The present invention relates to a technique useful in estimating the position and orientation of a target object.
2. Description of the Related Art
In the field of recognition using visual information, there are various types of research and development activities that have been carried out to estimate the position and orientation of a three-dimensional object. For example, industrial robots and experimental Humanoid robots require three-dimensional information to perform random picking and there is the increasing necessity for the three-dimensional information.
To obtain three-dimensional information representing the position and orientation of a target object, there is a conventional method using three-dimensional sensors, such as stereo cameras and laser range finders. If a target object has a known shape, a monocular camera can be used to estimate the position and orientation of the target object.
As discussed in Japanese Patent Application Laid-Open No. 2002-63567, it is conventionally feasible to estimate the position and orientation of a three-dimensional target object based on an image captured by a monocular camera.
More specifically, the technique discussed in Japanese Patent Application Laid-Open No. 2002-63567 includes associating feature points of a learning image with three-dimensional coordinates thereof, and calculating a transformation matrix through an optimization calculation capable of minimizing errors based on three-dimensional coordinates of feature points of the learning image that coincide with feature points obtained from an input image.
The technique discussed in Japanese Patent Application Laid-Open No. 2002-63567 further includes using the obtained transformation matrix to generate an image from a model, and obtaining a finalized orientation by correcting an estimated orientation based on the generated image.
As discussed in Japanese Patent Application Laid-Open No. 2002-109539, it is conventionally feasible to obtain a transformation matrix using three of feature points obtained from an input image.
A technique discussed in Japanese Patent Application Laid-Open No. 2007-219765 includes obtaining learning images captured from a plurality of viewpoints, comparing local feature information of respective learning images with local feature information obtained from an input image, and outputting viewpoint information of the most similar learning image as orientation of the input image.
As discussed in Japanese Patent Application Laid-Open No. 2009-128075, the distance sensor can be conventionally used to estimate the position and orientation a three-dimensional object. More specifically, the technique discussed in Japanese Patent Application Laid-Open No. 2009-128075 includes calculating three-dimensional feature information of input data, obtaining a corresponding relationship with three-dimensional feature information relating to a plurality of feature points of a model, and calculating the position and orientation of an object using rigid-body transformation.
In this case, selection of a plurality of feature points is performed considering operational restraint conditions (e.g., the front/back of an object) and a mixing state of classes as a clustering result of feature information to effectively select the points in the detection.
According to the technique discussed in Japanese Patent Application Laid-Open No. 2002-63567 and the technique discussed in Japanese Patent Application Laid-Open No. 2002-109539, the selection of feature points is performed manually and intentionally. For example, if the target object is a human, feature points to be selected are eyes and a mouth. In other words, extracting the most useful feature points from a learning image for three-dimensional position/orientation estimation processing is not mentioned in Japanese Patent Application Laid-Open No. 2002-63567 and in Japanese Patent Application Laid-Open No. 2002-109539.
According to the technique discussed in Japanese Patent Application Laid-Open No. 2007-219765, it is fundamental that all orientations are discriminated as different classes and, therefore, the obtained solutions are discrete.
Therefore, the angular resolution of a discriminating system is substantially determined by an angular resolution in the change of viewpoint when a learning image is acquired. Therefore, if the resolution in shooting angle is increased to improve the accuracy of each solution, it becomes difficult to identify the orientation because of an increase in the number of similar images having different orientations.
According to the technique discussed in Japanese Patent Application Laid-Open No. 2009-128075, useful feature points of a model are selected from the clustering result of feature information. However, the possibility that the useful feature points may be biased undesirably depending on the viewpoint is not mentioned. Even if many feature points are selected, these feature points may be the ones visible from a limited number of viewpoints.