In recent years, extensive research has been conducted on mixed reality, which aims at seamless merging of physical and virtual spaces. An image display apparatus which presents mixed reality can be implemented as an apparatus which displays an image obtained by superimposing images of a virtual space (virtual objects, text information, and the like rendered by computer graphics), which are generated according to the position and orientation of an image capture device onto an image of a physical space captured by an image capture device such as a video camera or the like.
Essential for implementation of such an image display apparatus, is measurement of a relative position and orientation between a reference coordinate system defined on the physical space (a coordinate system on the physical space, which serves as a reference upon determining the position and orientation of a virtual object to be superimposed), and the coordinate system of the image capture device (camera coordinate system). This is because in order to render the virtual object (virtual space image) to fit the position on the physical space, the image of the virtual object must be generated using the same camera parameters as the physical camera parameters of the image capture device with respect to the reference coordinate system. For example, when superimposing the image of the virtual object at a certain position in a physical room, the reference coordinate system is defined on the room, and the position and orientation of the image capture device in the reference coordinate system can be calculated. When an arbitrary virtual pattern or label is to be superimposed on a physical box held by the hands of an observer, the object coordinate system of the box itself is considered as the reference coordinate system, and the position and orientation of the box (reference coordinate system) with respect to the image capture device can be calculated.
As a method of measuring the position and orientation of the image capture device, it is a common practice to lay out or set a plurality of indices (artificial markers, natural features, and the like) on the physical space, to detect the coordinates of projected images of the indices in an image captured by the image capture device, and to calculate the position and orientation of the image capture device based on the relationship with the coordinate information of the indices (for example, non-patent reference 1). However, using this approach, there is a restriction that the indices must always be captured.
On the other hand, an attempt has been made to mount a six-degrees-of-freedom position/orientation sensor using a magnetic sensor, ultrasonic sensor, or the like on the image capture device, and to correct errors of the position and orientation of the image capture device measured by this sensor using information (image information) obtained from an image acquired by capturing indices (for example, patent reference 1 and patent reference 2). With the method disclosed in patent reference 2, when indices are detected in the captured image, errors of sensor measurement values are corrected based on that information. When no index is detected, the measurement values of the six-degrees-of-freedom position/orientation sensor are used, unchanged, as the position and orientation of the image capture device. Because the position and orientation of the image capture device can be obtained irrespective of the presence/absence of the detection of indices, mixed reality can be presented stably.
With the method of patent reference 2, when the number of detected indices is three or more, the six degrees of freedom of the position and orientation of the image capture device are calculated based on the image information. When the number of detected indices is two or one, processing for correcting one of the position and orientation (two or three degrees of freedom) of the image capture device measured by the sensor is applied. More specifically, algorithms used to calculate the position and orientation of the image capture device are switched to have the number of detected indices as judging standards. In this way, even when the position and orientation of the image capture device cannot be calculated based only on the image information (when the number of captured indices is less than three), the position and orientation, which have undergone correction to cancel, as much as possible, errors in the sensor measurement values, can be acquired with reference to the sensor measurement values.
However, with the method of patent reference 1, processing for correcting only one of the position and orientation of the image capture device measured by the sensor based on the image information irrespective of the number of detected indices is applied. With this correction method, upon correcting the orientation, rotation correction values to cancel errors on indices are individually calculated for respective detected indices, and are averaged to calculate a correction value for the orientation measurement value. Upon correcting the position, translation correction values to cancel errors on indices are individually calculated for respective detected indices, and are averaged to calculate a correction value for the position measurement value. Since the degrees of freedom in correction are limited to two or three irrespective of the number of indices, stable solutions can be obtained even when the amount of information is insufficient.    Non-patent reference 1: Kato, et. al.: “An Augmented Reality System and its Calibration based on Marker Tracking”, TVRSJ, vol. 4, no. 4, pp. 607-616, 1999.    Non-patent reference 2: J. Park, B. Jiang, and U. Neumann: “Vision-based pose computation: robust and accurate augmented reality tracking,” Proc. 2nd International Workshop on Augmented Reality (IWAR'99), pp. 3-12, 1999.    Non-patent reference 3: D. G. Lowe: “Fitting parameterized three-dimensional models to images,” IEEE Transactions on PAMI, vol. 13, no. 5, pp. 441-450, 1991.    Non-patent reference 4: Satoh, Uchiyama, and Yamamoto: UG+B: A Registration Framework Using User's View, Gyroscope, and Bird's-Eye View, TVRSJ, vol. 10, no. 3, pp. 391-400, 2005.    Non-patent reference 5: I. Skrypnyk and D. Lowe: “Scene modeling, recognition and tracking with invariant image features,” Proc. 3rd International Symposium on Mixed and Augmented Reality (ISMAR'04), pp. 110-119, 2004.    Non-patent reference 6: D. Kotake, K. Satoh, S. Uchiyama, and H. Yamamoto: “A hybrid and linear registration method utilizing inclination constraint,” Proc. 4th International Symposium on Mixed and Augmented Reality (ISMAR'05), pp. 140-149, 2005.    Patent reference 1: Japanese Patent Laid-Open No. 2003-222509    Patent reference 2: Japanese Patent Laid-Open No. 2003-279310    Patent reference 3: Japanese Patent Laid-Open No. 2003-344018    Patent reference 4: Japanese Patent Laid-Open No. 2004-233334