In recent years, studies about Mixed Reality (to be abbreviated as MR hereinafter) that aim at seamless merging of real and virtual spaces have been extensively made. Especially, of such MR techniques, an Augmented Reality (AR) technique that superimposes a virtual space in a physical space has received a lot of attention.
Display devices which allow the user to experience an MR space in correspondence with the motion of the head of the user can be classified into two systems depending on their implementation methods. One system is a video see-through system which superimposes an image of a virtual space (a virtual object, text information, and the like) rendered by computer graphics (to be abbreviated as CG hereinafter) generated according to the pose, which consists of position and orientation, of an image sensing device on an image of a physical space sensed by the image sensing device such as a video camera or the like. The other system is an optical see-through system which displays an image of a virtual space generated according to the pose of the viewpoint of a viewer on an optical see-through display mounted on the head of the viewer.
Applications of the AR technique to various fields such as an operation assistance that superimposes the conditions in the body onto the body surface of a patient, an architectural simulation that superimposes a virtual building on a blank space, an assembly assistance that superimposes the work sequence and the wiring state upon assembly, and the like are expected.
One of the most important problems in the AR technique is how accurately the physical space and virtual space are registered, and many conventional efforts have been done. The registration problem in AR amounts to a problem of obtaining the pose of an image sensing device in a scene (i.e., on a reference coordinate system) in case of the video see-through system. Likewise, that problem amounts to a problem of obtaining the pose of the viewpoint of the viewer or display in a scene in case of the optical see-through system.
As a method of solving the registration problem in the video see-through system, the following method is known. That is, a plurality of feature points (markers) whose coordinate (world coordinate) values in a three-dimensional (3D) space are known are arranged and are sensed by a camera, and the pose of the camera, which satisfy the relationship between the world coordinate values and the sensed image coordinate values of the markers, are calculated (see patent reference 1). As a method of solving the registration problem in the optical see-through system, it is a common practice to mount an image sensing device on an object to be measured (i.e., the head of the viewer or a display), to calculate the pose of this image sensing device by the same method as in the video see-through system, and to calculate the pose of the object to be measured based on the calculated pose of the image sensing device.
In general, if the image coordinate values of a plurality of points (three points or more theoretically, or six points or more for stable solution), whose 3D positions are known, on a sensed image are obtained, the pose of the camera viewpoint can be calculated based on the correspondence between the 3D positions and image coordinate values.
A method of calculating the pose of an image sensing device based on sets of the 3D coordinate values and image coordinate values of indices has been proposed in the field of photogrammetry, as described in non-patent references 1 and 2.
Furthermore, a method that uses a square-shaped index (to be referred to as a square index hereinafter) having a known size as an index has also been proposed (see non-patent references 3 and 4).
Moreover, a method that uses a combination of a square index and dot index as indices has been proposed (see non-patent reference 5).
The dot index has a merit that it can be set in a narrow place. The square index has merits of easy identification, and calculations of the pose of an image sensing device from only one index since one index has a large information volume. Hence, the dot index and square index can be used complementarily.
Based on an image sensed by an image sensing device using the aforementioned method, the pose of this image sensing device are conventionally acquired.
On the other hand, a six-degrees-of-freedom position and orientation sensor such as a magnetic sensor, ultrasonic sensor, or the like is attached to an image sensing device as an object to be measured, and the pose are measured using the sensor measurement result and the aforementioned detection of indices by image processing together. Since the output value of the sensor can be stably obtained although its precision changes depending on the measurement range, the method using the sensor and image processing together can improve robustness compared to the method using the image processing alone (see patent reference 2 and non-patent reference 6).
Since the conventional registration method using indices acquires the pose on a reference coordinate system of the image sensing device as an object to be measured, the position on the reference coordinate system in case of the dot index, and the pose on the reference coordinate system in case of the square index must be known. In case of the square index, the square index itself is normally used as a reference for a coordinate system without separately defining the reference coordinate system. However, when a plurality of square indices is used, the relative relationship among their poses (to be referred to as a layout relationship hereinafter) must be known. For this reason, the reference coordinate system is required to define the pose relationship among the plurality of indices.
The pose of each index on the reference coordinate system can be measured manually using a surveying tape and protractor, or using a surveying instrument. However, measurement is done using an image in consideration of precision and labor. The positions of the dot indices can be measured by a method called bundle adjustment. The bundle adjustment method is executed as follows. That is, many images of dot indices are sensed by an image sensing device, and the pose of the image sensing device that senses images, and the positions of the dot indices, are calculated by an iterative calculation so as to minimize the errors (projection errors) between the projection positions where the indices are actually observed on the images and projection positions calculated based on the pose of the image sensing device and the positions of the indices under the constraint conditions of three points, i.e., the position of the dot index in the physical space, the projection point of that dot index on an image, and the viewpoint of the image sensing device exist on a straight line.
A method of measuring the poses of a large number of square markers laid out in a 3D space has been proposed (see non-patent reference 7).
Also, a method of calculating the position and orientation of an image sensing device that senses images and the positions and orientations of square markers by an iterative calculations to minimize projection errors by sensing images of a large number of square markers laid out in a 3D space has been proposed (see non-parent reference 8).
In a measurement method using an image (to be referred to as vision-based marker calibration hereinafter), upon calculating the layout relationship among N indices (M1 to MN), when an arbitrary index Mi is sensed simultaneously with at least one index Mj (i ≠j) and markers sensed at the same time form a group, all markers must form one group. A practical example will be described below with reference to FIG. 1.
FIG. 1 shows a problem upon calculating the layout relationship between markers when two markers form one group, and two groups are sensed using cameras.
Referring to FIG. 1, reference numerals 101 and 102 denote cameras. The camera 101 senses an image of markers 101a and 101b, and the camera 102 senses an image of markers 102a and 102b. In FIG. 1, reference numeral 103 denotes an image which is sensed by the camera 101, and includes the markers 101a and 101b. On the other hand, reference numeral 104 denotes an image which is sensed by the camera 102 and includes the markers 102a and 102b. 
According to the prior art, when the image 103 is used, the layout relationship between the markers 101a and 101b can be calculated. When the image 104 is used, the layout relationship between the markers 102a and 102b can be calculated. However, the layout relationship between the markers included in the different images, i.e., the layout relationship between the marker 101a in the image 103 and the marker 102a in the image 104, cannot be calculated.
To solve this problem, as a general device that allows sensing a broad range, a method using a camera with a wide field of view may be adopted. However, this method suffers a problem of a large influence of image distortion. Also, a method of sensing an image from afar may be used. However, a high-resolution image cannot be obtained. Therefore, it is difficult to calculate an accurate layout relationship.
When there are physical factors, e.g., when an obstacle exists between markers, markers have different orientations, and so forth, simultaneous image sensing itself is difficult to attain.
As described above, a large number of markers are laid out so that the layout relationship among all the markers can be calculated. However, laying out many markers may impair scenery, and upon MR experience, it is desirable to lay out a minimum required number of markers required for registration (see non-patent reference 9).
Patent reference 1: Japanese Patent Laid-Open No. 2000-041173
Patent reference 2: Japanese Patent Laid-Open No. 2002-228442
Non-patent reference 1: R. M. Haralick, C. Lee, K. Ottenberg, and M. Nolle: “Review and analysis of solutions of the three point perspective pose estimation problem”, Int'l. J. Computer Vision, vol. 13, no. 3, pp. 331-356, 1994.
Non-patent reference 2: M. A. Fishler and R. C. Bolles: “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography”, Comm. ACM, vol. 24, no. 6, pp. 381-395, 1981.
Non-patent reference 3: Junichi Rekimoto: “Augmented Reality using the 2D matrix code”, Interactive system and software IV, Kindai kagaku sha, 1996.
Non-patent reference 4: Kato, M. Billinghurst, Asano, and Tachibana: “Augmented reality system and its calibration based on marker tracking”, Transactions of the Virtual Reality Society of Japan, vol. 4, no. 4, pp. 607-616, 1999.
Non-patent reference 5: H. Kato, M. Billinghurst, I. Poupyrev, K. Imamoto and K. Tachibana: “Virtual object manipulation on a table-top AR environment”, Proc. ISAR2000, pp. 111-119, 2000.
Non-patent reference 6: A. State, G. Hirota, D. T. Chen, W. F. Garrett and M. A. Livingston: “Superior augmented reality registration by integrating landmark tracking and magnetic tracking”, Proc. SIGGRAPH'96, pp. 429-438, 1996.
Non-patent reference 7: G. Baratoff, A. Neubeck and H. Regenbrecht: “Interactive multi-marker calibration for augmented reality applications”, Proc. ISMAR2002, pp. 107-116, 2002.
Non-patent reference 8: G. Baratoff, A. Neubeck and H. Regenbrecht: “Interactive multi-marker calibration for augmented reality applications”, Proc. ISMAR2002, pp. 107-116, 2002.
Non-patent reference 9: G. Baratoff, A. Neubeck and H. Regenbrecht: “Interactive multi-marker calibration for augmented reality applications”, Proc. ISMAR2002, pp. 107-116, 2002.