1. Field of the Invention
The present invention relates to an information processing apparatus, a control method therefor, and a computer-readable storage medium, and in particular relates to technology for estimating the position and orientation of a target object in a three-dimensional space.
2. Description of the Related Art
The following are known to be two main techniques for estimating the position and orientation of an object in a three-dimensional space (three-dimensional measurement). One is stereo vision that employs a triangulation method, and the other is a technique of performing pattern matching with a registered image in which the position and orientation are known.
In the case of stereo vision, there is known to be a technique that uses two cameras and a technique using one camera and a laser beam. In both of these techniques, the three-dimensional position of an observation point is obtained based on a triangulation method performed using one point that is observed and two points from which observation is performed. With the technique using two cameras, it is difficult to specify corresponding observation points in the images captured by the cameras, and there is a high possibility of errors occurring. This is known as the problem of finding corresponding points in stereo vision. In contrast, it is simple to find corresponding points in the technique using a laser beam, but it is difficult to accurately control the laser beam, thus leading to errors.
Whereas stereo vision requires two cameras, or one camera and a laser beam irradiation apparatus, the technique using pattern matching basically requires only one camera. With this technique, images of objects in which the three-dimensional position and orientation are known in advance are stored, and when an image is newly input, the position and orientation of the target object are obtained by performing matching between the new image and the stored images.
As one example of this technique using pattern matching, there is known to be a configuration in which a parametric eigenspace method is used to perform orientation estimation using a small number of registered images (Japanese Patent Laid-Open No. 8-153198).
There is also known to be a technique of specifying the position of an object with higher accuracy by performing pattern matching using two or more cameras (Japanese Patent Laid-Open No. 2003-22442). In the technique disclosed in Japanese Patent Laid-Open No. 2003-22442, the panning, tilting, and zooming of multiple cameras are sequentially controlled. Specifically, the detected position of the object in the image captured immediately previously by a camera is used to successively determine the panning, tilting, and zooming so as to enable detection of the same object with the camera that is to perform image capturing next. A position is considered to be correct if the object has been detected with two or more cameras, and the position of the object is determined so as to minimize the error between the positions detected by each camera.
The aforementioned problem of finding corresponding points becomes a fundamental problem arises when two cameras are used in the stereo vision described above problem. In particular, in stereo vision, corresponding points need to be visible from the two cameras, and if an observation point is not visible from either of the cameras due to self-occlusion of the target object, it is impossible in principle to perform three-dimensional measurement.
The technique disclosed in Japanese Patent Laid-Open No. 2003-22442 also requires corresponding points to be visible from two cameras, and this technique can be said to be a technique in which, if an observation point is visible from only one of the cameras due to self-occlusion of the target object, a technique of estimating the position and orientation of the object using pattern matching with one camera is simply extended to multiple cameras. In other words, images captured by multiple cameras are used individually in pattern matching. Accordingly, there is a limit to the accuracy in estimation of the position and orientation of the object. Also, in the case where contradicting position and orientation estimation results are obtained by multiple cameras, there is the problem that resolving such a contradiction is difficult.