Field of the Invention
The present invention relates to an information processing apparatus and a method for controlling the information processing apparatus.
Description of the Related Art
The mixed reality (MR) technology for seamlessly mixing the virtual space made by a computer with the real space has attracted attention in recent years.
The MR technology is expected to be applied to diverse fields, for example, assembly support where assembly procedures and wiring conditions are displayed in superimposed manner at an assembly work, and surgery support where inside conditions of a patient's body are displayed on the patient's body surface in superimposed manner.
It is necessary to correctly express the anteroposterior relation between real and virtual objects to allow a user to feel that a virtual object really exists in the real space without a feeling of strangeness. This issue is also referred to as “occlusion problem”. In particular, the occlusion problem is crucial also for a video see-through type MR system in which a virtual object is superimposed on an image captured by a camera.
In order to correctly express the anteroposterior relation between real and virtual objects, three-dimensional position information for each of the real and virtual objects is obtained. The obtained three-dimensional position information for the real object and the obtained three-dimensional position information for the virtual object are then compared with each other. When the real object is anterior to the virtual object, a captured image of the real object is displayed on the anterior side. When the virtual object is anterior to the real object, processing for displaying the virtual object on the anterior side is performed. In this processing, since there is a known three-dimensional model for the virtual object, a three-dimensional position of the virtual object to a viewpoint can be calculated. On the other hand, a three-dimensional position of the real object to the viewpoint is unknown only by capturing the image of the real object. It is therefore necessary to obtain a three-dimensional position of the real object.
A technique for measuring a three-dimensional position of a real object will be described below.
A technique discussed in “Hayashi K, Kato H, and Nishida S, Depth Determination of Real Objects using Contour Based Stereo Matching. Journal of the Virtual Reality Society of Japan. 2005; 10(3): 371-380” detects a moving object by using a difference between a background image referred to as a key frame and a current image captured by a camera. Then, matching of points on a contour of the detected moving object is performed. Since the matching is performed only on points on a boundary, high-speed processing is realized.
The following describes a method for measuring a depth of a target object that is discussed in “Hayashi K, Kato H, and Nishida S, Depth Determination of Real Objects using Contour Based Stereo Matching. Journal of the Virtual Reality Society of Japan. 2005; 10(3): 371-380”. The method estimates the depth of a target object by the following process.
1. In each of right and left images in which a target object captured by a stereo camera appears, a contour of the target object in each of the right and left images is identified based on a difference from a background image.
2. In the left image, after dividing the contour of a region at equal intervals, a point having a large curvature is calculated. Then, a sampling point is set.
3. An epipolar line corresponding to the sampling point set in the left image is projected to the right image. Then, points at which a distance between the epipolar line and the contour is minimized are set as corresponding points.
4. A depth value of the obtained corresponding points are calculated based image coordinates of the obtained corresponding points on right and left images, and known relative position and orientation information for the stereo camera.
5. A depth value of a line segment on the contour between a plurality of the corresponding points having depth values is calculated by performing linear interpolation on the depth values of the corresponding points at both ends. This processing is performed on the contour in each of the right and left images.
6. When all depth values on the contour have been obtained, horizontal linear interpolation is performed on depth values of the contours at both ends of the region to obtain a depth value of the region inside the contour. This processing is also performed on the contour in each of the right and left images.
However, if both conditions 1 and 2 described below are satisfied, an error arises in the depth value estimated for the target object calculated by the method discussed in “Hayashi K, Kato H, and Nishida S, Depth Determination of Real Objects using Contour Based Stereo Matching. Journal of the Virtual Reality Society of Japan. 2005; 10(3): 371-380”.
Condition 1: As illustrated in FIG. 9, a depth direction 901 at a contour portion having a large curvature, such as a fingertip, is close to a visual axis direction 902. More specifically, the fingertip is oriented in the depth direction of a camera 100.
Condition 2: As illustrated in FIG. 8, for example, when there are no corresponding points in a vicinity of a fingertip 801 having a large curvature of the contour, depth values are determined by performing linear interpolation on depth values of corresponding points 802 and 803 at both ends.
A reason why an error arises when both of these two conditions are satisfied will be described below with reference to FIGS. 10 and 12. FIG. 10 is a schematic diagram illustrating a relation between the cameras 100 and 110 and a hand 150 when the above-described conditions 1 and 2 are satisfied at the same time. FIG. 12 is a schematic diagram illustrating the fingertip 801 illustrated in FIG. 10 in an enlarged view.
If linear interpolation is simply performed on the tip portion of the fingertip 801 based on depth values from the cameras 100 and 110, the fingertip portion is provided as a depth value 1001 illustrated in FIG. 10, whereby an error occurs for the fingertip 801. This is because the depth value 1001 of the fingertip 801 is obtained by interpolating the depth values of the corresponding points 802 and 803 at both ends which are on the anterior side of the image capturing device 100. More specifically, since the depth values of the corresponding points 802 and 803 are constantly anterior to the original depth value of the fingertip 801, an error in the result of linear interpolation also constantly occurs to the depth value of the fingertip 801.
As described above, an error arises in the vicinity of the fingertip 801. As a result, for example, at determination of interference between the fingertip 801 and a virtual object, the fingertip 801 may be incorrectly determined to be not in contact with the virtual object although they are actually in contact with each other.