Detection of the position and orientation of an image sensing unit such as a camera that senses images in real space (hereinafter for convenience called simply “camera”) is necessary to a mixed reality system that combines real space and virtual space and displays combined space (hereinafter “mixed reality space”).
As disclosed as conventional art in, for example, Japanese Laid-Open Patent Publication No. 11-084307 (publication no. 1), Japanese Laid-Open Patent Publication No. 2000-041173 (publication no. 2), Japanese Laid-Open Patent Application No. 2000-354230 (publication no. 3) and Japanese Laid-Open Patent Publication No. 2002-230586 (publication no. 4), as a method of obtaining the position and orientation of such a camera there is a method of detecting the position and orientation of the camera by a position and orientation sensor such as a magnetic sensor, without using the sensed image of the camera (hereinafter called “method 1”).
The inventions described in the foregoing publication nos. 1, 2 and 3 focus on the problem of the inability to obtain adequate accuracy with method 1. In addition to method 1, a method of correcting detection errors of the position and orientation sensor that detects those of the camera using markers whose positions disposed in real space are known or characteristic features whose position in real space are known (hereinafter markers and characteristic features are collectively referred to as “indicators”) is disclosed. The methods disclosed in publications 1-3, although employing different detection principles as well as means and processes, are alike insofar as they obtain the position and orientation of the camera based on data obtained from a 6-DOF (degree of freedom) position and orientation sensor for detecting the position and orientation of the camera, data concerning points whose positions in real space are known and data concerning these points acquired by the camera. Hereinafter, these methods are collectively referred to as method 2.
Moreover, as disclosed in, for example, W. A. Hoff and K. Nguyen, “Computer Vision-based Registration Techniques for Augmented Reality,” Proc. SPIE, vol. 2904, pp. 538-548, November 1996, U. Neumann and Y Cho, “A Self-tracking Augmented Reality System”, Proc. VRST '96, pp. 109-115, July 1996, Rekimoto Junichi, “Techniques for Constructing an Augmented Reality System Using a 2-dimensional Matrix Code, Interactive Systems and Software IV, Kindaikagakusha, pp. 199-208, December 1996, many different methods of obtaining the position and orientation of the camera from just the information obtained by the camera sensing indicators that exist in real space (scene) with the camera are being implemented. These methods, which do not use sensors other than the camera, are hereinafter collectively referred to as method 3.
Method 3 is a method that detects the position and orientation of the camera solely by image processing, and thus suffers from an inability to track in cases that the position of the indicators change dramatically from one frame to the next, due, for example, to rapid movement of the head.
As a method of solving the foregoing problem with method 3, a method that uses a combination of sensors other than the camera has been proposed in S. You, U. Neumann, and R. Azuma, “Hybrid Inertial and Vision Tracking for Augmented Reality Registration”, Proc. IEEE Virtual Reality '99, pp. 260-267, March 1999. This method, by predicting the coordinates of the indicator in the next frame from the coordinates of an indicator detected in the present frame and an angular speed obtained from a gyro sensor, improves intra-image indicator detection accuracy. Further, as an extension of the foregoing method, in Fujii, Kamihara, Iwasa and Takemura, “Positioning by Stereo Camera Using Gyro Sensors for Augmented Reality”, Shingaku Gihou, PRMU99-192, January 2000, a method is proposed that improves upon the foregoing method by adding data obtained from the image data to the indicator coordinate prediction method. These methods, which predict the coordinates of the indicator by using sensors other than the camera, are hereinafter collectively called method 4.
Besides the foregoing methods, at the same time, as another method that uses inertial sensors such as gyro sensors, a method has been proposed involving the use of an extended Kalman filter. It is not a method that, like method 4, simply utilizes inertial sensors for prediction computation of the indicator in the next picture frame. Rather, this method is an attempt to install as input values the inertial sensor detection data directly into the extended Kalman filter that performs the camera position and orientation estimation computation (prediction computation). At the same time, the indicator image coordinate values are also input to the extended Kalman filter.
In other words, the method takes inertial sensor detection values and the position of coordinates detected from the image as the source of data input, directly applies such data to the extended Kalman filter, and predictively estimates the present position and orientation of the camera (since the position of the indicator obtained from the inertial sensors and the image is data obtained previously using the present camera position and orientation as the reference standard, an estimate of the present position and orientation can be obtained predictively). This method, by using sensor detection data obtained continuously without a break over a continuous period of time at precisely determined intervals as well as indicator data detected from the image to predict the continuous movement of the camera, estimates the position and orientation of the camera. As this method, a method of estimating the position and orientation of the camera has been proposed in Yokokouji, Sugawara and Yoshida, “Position Tracking Technique Using Vision and Angular Speed for a Head-Mounted Display,” The Virtual Reality Society of Japan, Journal of The 2nd Annual Conference, pp. 121-124, September 1997 that involves combining six acceleration sensors and obtaining translational acceleration along three axes and rotation angle acceleration along three axes, and using these values as inputs to the extended Kalman filter to estimate the position and orientation of the camera. Moreover, in L. Chai, K. Nguyen, B. Hoff, and T. Vincent, “An Adaptive Estimator for Registration in Augmented Reality”, ibid, pp. 23-32, October 1999, and S. You and U. Neumann, “Fusion of Vision and Gyro Tracking for Robust Augmented Reality Registration”, Proc. IEEE Virtual Reality 2001, pp. 71-78, March 2001 as well, a technique using gyro sensors and an extended Kalman filter is proposed. Such methods, involving use of gyro sensors and an extended Kalman filter, are hereinafter collectively called method 5.
As described above, broadly speaking, there have been methods 1 through 5 as conventional arts.
Method 1, as has been made problematic already in method 2, using only a 6-DOF sensor gives rise to the problem that the accuracy of the position and orientation obtained is inadequate. Moreover, a 6-DOF sensor generally requires a separate apparatus mounted in real space (excluding the camera) for outputting signals for detection by the sensor unit mounted on the camera. For example, in the case of a magnetic type 6-DOF sensor, an apparatus that generates a magnetic field for detection by a sensor mounted on the detection target corresponds to this separate apparatus, and in the case of an optical type of sensor, a photoreceptor or photo-generator unit corresponds to this separate apparatus. That such a separate apparatus is required means that the detection range of the sensor is restricted to a limited range. In addition, there is the problem that, depending on the installation site, it might be difficult to install the separate apparatus itself, or, if installation is possible, it might be inconvenient. Further, there is the problem that the sensor itself remains generally expensive.
Method 2 is a method proposed in light of the drawbacks of method 1, in particular the inability to obtain adequate detection accuracy, and solves the problem of poor detection accuracy. However, insofar as the method uses a 6-DOF sensor, the problems of detection range, installation and cost remain unresolved.
For method 3, because it does not use a 6-DOF sensor, in principle the problems of limited detection range., installation or cost do not exist. In addition, if indicators whose positions in real space are known can be observed by a sufficient number of cameras, the position and orientation detection accuracy is also adequate. However, once any one of the indicators can no longer be observed from the camera, method 3 is totally unable to obtain the position and orientation of the camera. As a result, it is necessary to maintain a necessary and sufficient number of indicators in view at all times, from all positions and from all orientations through which the camera can move. In other words, a very large number of indicators must be disposed in real space, or the positions of the indicators in three dimensions must be input by some method, and therein lies the problem.
Further, when detecting the indicators from the image, in some instances there might be a problem with the detectional stability of the indicators. To be specific, in order to stabilize the detection of the indicators, each indicator must be large enough to enable that particular indictor to be identified and defined on its own, and moreover, that indicator must occupy a sufficiently large surface area within the image acquired by the camera. However, such a large indicator cannot always be installed in real space. As a result, situations arise in which, instead of using a large indicator, for example, a small indicator must necessarily be used, so that the indicator cannot be identified in isolation. In this case, for example, if there is a sudden rapid movement of the camera, the possibility that the indicator will fail to be detected increases.
Method 4 was conceived in light of the indicator detection stability problem of method 3. However, with method 4, prior to indicator detection from the image a gyro sensor is used solely for the purpose of predicting the position of the indicator, and with respect to everything else besides increasing indicator detection stability, method 4 retains the same problems that plague method 3. It should be noted that, in method 4, data obtained by the gyro sensor is not used in the computations used to estimate the position and orientation of the camera, and thus method 4 is the same as method 3 with respect to the computations performed to estimate the position and orientation of the camera. An inertial sensor such as a gyro sensor does not required a separate apparatus that must be disposed within real space, and therefore method 4 escapes the problems of detection range and installation that affect methods 1 and 2. In addition, because of its low cost in general, method 4 also solves the problem of cost associated with method 1 and method 2.
Method 5, unlike methods 1 through 4 described above, estimates the position and orientation of the camera continuously, without interruption, at precise intervals over a continuous period of time. With method 5, it is necessary to acquire data continuously at a certain interval, and moreover, it is necessary to estimate continuously at a certain interval. However, considering a case of adapting acquired position and orientation to a mixed reality system, because a process of uncertain required time, i.e., virtual space rendering is included in the processing, the position and orientation of the camera at the time a rendering is completed cannot be predicted when computations are performed using the extended Kalman filter, and thus does not always function effectively. Moreover, for the same reason, it is sometimes difficult in itself to perform estimation computations at precise intervals. As a result, it is difficult to apply method 5 to a mixed reality system in the first place, and even if applied the correct computation results cannot be expected.
In addition, the construction of the extended Kalman filter and the adjustment of parameters to the usage situation are delicate operations, and therefore it is difficult to implement settings that provide a desired operation. In the event that desired operation cannot be achieved, a problem that cannot arise with methods 1 through 3 (all of which use only data at one particular moment) does arise here, that is, a phenomenon in which the position and orientation detection results waver over time (with the result that the accuracy of the position and orientation detection is poor, insofar as, when looking at any given moment in time, the indicator positions in the image at that moment do not match up). It should be noted that this type of problem with method 5 does not arise with method 4 because method 4 uses a limited form in which the gyro sensor detection results are estimates of the detection region of the indicators, and thus method 4 is basically the same as methods 1 through 3.