1. Field of the Invention
The present invention relates to a technique of obtaining the position and orientation of an image sensing device.
2. Description of the Related Art
A lot of research has been done in association with mixed reality that presents a text or CG image superimposed on physical space. An image display device to present the mixed reality superimposes an image generated based on the position and orientation of a camera on an image captured by, e.g., a video camera and displays the superimposed image.
To implement such an image display device, it is necessary to measure the relative position and orientation between a camera coordinate system and a reference coordinate system defined in physical space. An example will be considered here, in which a virtual object is superimposed at a predetermined position, i.e., in a room or on a table in the physical environment. In this case, the reference coordinate system is defined at an appropriate point (e.g., the floor surface of the room or the table surface) of the environment, and the position and orientation of the camera in the reference coordinate system are obtained.
To implement this measurement, it is common practice to estimate the position and orientation of a camera by using an image captured by it (Sato, Uchiyama, and Yamamoto, “UG+B method: A Registration method using a subjective viewpoint camera, an objective viewpoint camera, and an orientation sensor”, Transaction of The Virtual Reality Society of Japan, Vol. 10, No. 3, pp. 391-400, 2005, I. Skrypnyk and D. Lowe, “Scene modeling, recognition and tracking with invariant image features”, Proc. 3rd International Symposium on Mixed and Augmented Reality (ISMAR'04), pp. 110-119, 2004, and J. Park, B. Jiang, and U. Neumann, “Vision-based pose computation: robust and accurate augmented reality tracking”, Proc. 2nd International Workshop on Augmented Reality (IWAR'99), pp. 3-12, 1999). The position and orientation of a camera in the reference coordinate system can be measured by, e.g., the following method.
(1) A plurality of indices with known positions (reference coordinates) in the reference coordinate system are arranged or set on a floor, wall, or table surface in a room. The indices may be artificial markers intentionally provided for measurement or natural features that exist inherently in the environment.
(2) The coordinates of the projected images of the indices in an image captured by a camera are detected.
(3) The position and orientation of the camera are obtained based on the correspondence between the detected image coordinates of the indices and their reference coordinates.
(Prior Art 1)
A typical method of implementing the above-described step (3) is a nonlinear optimization method which minimizes the reprojection error of an index by iterative calculation (Sato, Uchiyama, and Yamamoto, “UG+B method: A Registration method using a subjective viewpoint camera, an objective viewpoint camera, and an orientation sensor”, Transaction of The Virtual Reality Society of Japan, Vol. 10, No. 3, pp. 391-400, 2005). This method calculates the mathematical values of the image coordinates (theoretical coordinates) of indices based on the estimated values of the position and orientation of a camera and the reference coordinates of the indices. The estimated values of the position and orientation of the camera are corrected to make the theoretical coordinates as close as possible to the actually detected coordinates of the indices. Optimum position and orientation are calculated by repeatedly calculating the theoretical coordinates and correcting the estimated values. To obtain the position and orientation of a camera with respect to time series images captured continuously, the above-described optimization process is executed for each input frame. The optimum position and orientation in each frame are thus calculated.
(Prior Art 2)
There are also provided methods of implementing the above-described step (3) in consideration of the continuity between frames in time series images. In a method, for example, a constraint “to inhibit any large change from the position and orientation in a preceding frame” is applied to the above-described nonlinear optimization method, thereby minimizing error while maintaining the continuity between frames as much as possible (I. Skrypnyk and D. Lowe, “Scene modeling, recognition and tracking with invariant image features”, Proc. 3rd International Symposium on Mixed and Augmented Reality (ISMAR'04), pp. 110-119, 2004).
(Prior Art 3)
It is also common practice to continuously obtain the position and orientation of a camera by using an extended Kalman filter (J. Park, B. Jiang, and U. Neumann, “Vision-based pose computation: robust and accurate augmented reality tracking”, Proc. 2nd International Workshop on Augmented Reality (IWAR'99), pp. 3-12, 1999). In the method using an extended Kalman filter, a motion model of a camera is generated by using an assumption such as a uniform velocity/uniform angular velocity motion or a uniform acceleration/uniform angular acceleration motion. The camera position, orientation, velocity, acceleration, angular velocity, and the like are set as the constituent elements of the state variables of the extended Kalman filter. In addition, sets of image coordinates and reference coordinates of indices in each frame are input as observed values. In the method using the extended Kalman filter, the position and orientation output in a frame are influenced not only by the observed values of the frame but also by the past state based on the assumed motion model. If the actual motion of the camera does not largely change from the assumed motion model, the estimated values of the position and orientation vary while maintaining the consistency between frames. Hence, a smooth motion can be obtained on the time series.
(Prior Art 4)
The position and orientation of a camera are directly measured by a position and orientation sensor attached to the camera, and errors are corrected by using captured indices. To avoid discontinuity that occurs when different correction values are obtained between frames, Japanese Patent Laid-Open No. 2005-107247 discloses a technique of averaging correction values obtained in a plurality of frames and applying the averaged correction value.
The approach (Prior Art 1) for calculating a position and orientation to minimize the reprojection error of an index in each frame still has room for improvement in terms of consistency between frames. Particularly if the combination of captured indices changes between frames, positions and orientations output from two consecutive frames may be discontinuous. This is because the position and orientation obtained in each frame contain error whose behavior changes if the combination of input indices changes.
The optimization method (Prior Art 2) considering the continuity between frames may be unable to follow an abrupt movement of a camera because of the inertial forces for the positions and orientations obtained in the past. Even in the method using an extended Kalman filter (Prior Art 3), it may be difficult to follow an abrupt change in the velocity or acceleration of a camera, or overshoot may occur. This is because the actual motion of the camera largely changes from an assumed motion model. In an interval with such a situation, the position and orientation measurement accuracy lowers.
Prior Art 4 aims at “correcting” the measurement error of a position and orientation sensor and cannot be applied to position and orientation measurement using only indices.