1. Field of the Invention
The present invention relates to a technique for measuring a three-dimensional position.
2. Description of the Related Art
Recently, mixed reality (MR) technology has been actively researched. The MR technology seamlessly merges a real space and a virtual space created by a computer. The MR technology is expected to be applied to various fields, such as assembly support in which operation procedures are displayed by superimposing wiring conditions thereon at the time of assembly, and surgery support in which a body surface of a patient is displayed by superimposing internal conditions of the body thereon.
A geometric consistency between a virtual object and the real space is important for making a user to feel that the virtual object exists in the real space. More specifically, the geometric consistency in the mixed reality includes two types of consistency. One is consistency for matching a coordinate system of the real space with the coordinate system of the virtual space, and the other is consistency for correctly expressing an anteroposterior relation between a real object and a virtual object. An issue for dealing with the former consistency is also referred to as a registration problem in the mixed reality, and various researches has been conducted (e.g., refer to Sato, Uchiyama, and Tamura, “A review of registration techniques in mixed reality”, Transactions of The Virtual Reality Society of Japan, Vol. 8, No. 2, pp. 171-180, 2003). An issue for dealing with the latter consistency is also referred to as an occlusion problem. In particular, the occlusion problem is also crucial for a video-see through MR system which superimposes a virtual object on an image captured by a camera.
To correctly express the anteroposterior relation between the real object and the virtual object, i.e., hiding (occlusion), it is necessary to obtain three-dimensional position and orientation information of the real object or the virtual object to be hidden. In other words, the three-dimensional position and orientation information of the real object is compared with the three-dimensional position and orientation information of the virtual object, and if the real object is anterior to the virtual object, the captured image is displayed on the anterior side. Further, if the virtual object is anterior to the real object, the virtual object is displayed on the anterior side. In such processing, since there is a known three-dimensional model for the virtual object, the three-dimensional position and orientation of the virtual object with respect to the viewpoint can be calculated. However, the three-dimensional position and orientation of the real object with respect to the viewpoint remains unknown by only capturing the image of the real object, so that it is necessary to obtain the three-dimensional position and orientation of the real object.
A technique for measuring the three-dimensional position and orientation of the real object will be described below. In a general three-dimensional position and orientation measurement technique, matching processing is applied to a focused point in images captured by a stereo camera based on epipolar constraints and pixel patch luminance information. More specifically, if there is a focused point in one image captured by the stereo camera, a point on an epipolar line in the other image captured by the stereo camera is recognized as a corresponding point candidate with respect to the focused point. Pattern matching is then performed between pixel patches around each point remaining as the corresponding point candidate and pixel patches around the focused point. The corresponding points can thus be accurately obtained. (See Japanese Patent Application Laid-Open No. 2011-27724)
Further, Yokoya, Takemura, Okuma, and Kanbara, “Stereo Vision Based Video See-Through Mixed Reality”, Proc. International Symposium on Mixed Reality (ISMAR 99), page 131-145, 1999, discusses measuring a three-dimensional position of a real object using two cameras attached to a head mounted display to solve the occlusion problem. In this technique, association is performed only in a region where a virtual object is drawn, so that a calculation amount is reduced.
Furthermore, Kenichi Hayashi, Hirokazu Kato, and Shougo Nishida, “Depth Determination of Real Objects using Contour Based Stereo Matching”, Transactions of The Virtual Reality Society of Japan, Vol. 10, No. 3, pp. 371-380, 2005, discusses a following method. A moving object is detected from a difference between a background image referred to as a key frame and a current camera image, and points on the contours of the detected object are matched. Since matching is performed only for the points on the contours, processing can be performed at high speed.
On the other hand, there is a time of flight (TOF) method for measuring a three-dimensional position and orientation in which a real object reflects light emitted from a light source, and a distance to an object is measured from a time of flight (i.e., a delay time) of the light in reaching a sensor and the speed of the light (refer to T. Oggier, B. Buttgen, and F. Lustenberger, “SwissRanger SR3000 and first experiences based on miniaturized 3D-TOF Cameras”, Swiss Center for Electronics and Microtechnology, CESM, IEE, Fachhochschule Rapperswil Switzerland, Technical Report, 2005).
However, the technique discussed in “Stereo Vision Based Video See-Through Mixed Reality” can reduce processing by performing stereo matching only in the region where the virtual object is drawn, but it is affected by measuring accuracy. The boundary between the virtual object and the real object thus cannot be correctly displayed, and providing a user with an MR experience with a feeling of strangeness.
Further, the technique discussed in “Depth Determination of Real Objects using Contour Based Stereo Matching” detects the object region using the difference from the key frame, and performs stereo matching of the contours of the object region. As a result, the measurement accuracy at the boundary between the virtual object and the real object is improved, and a user is provided with less feeling of strangeness. However, there is no consideration of noise generated due to a change in background difference regions between frames, and variation of generated depth values is large between the frames. Such a variation may prevent a user from experiencing an immersive feeling.