1. Field of the Invention
The present invention relates to a technique for calculating the position and orientation of an image sensing device.
2. Description of the Related Art
In recent years, studies about an Augmented Reality (to be abbreviated as AR hereinafter) technique, which superimposes information of a virtual space in a physical space and presents a composite image to the user, have been extensively made. As an information presentation apparatus in the AR technique, a video see-through head mounted display (to be abbreviated as HMD hereinafter) is typically used. The video see-through HMD incorporates a camera which senses an image of the physical space. On the other hand, virtual objects are rendered by computer graphics (to be abbreviated as CG hereinafter) in accordance with the position and orientation of this camera in the physical space. A composite image is generated by superimposing the rendered virtual objects on the image of the physical space, and is displayed on a display device such as a liquid crystal panel or the like of the HMD. With this information presentation apparatus, the user can feel as if the virtual objects exist in the physical space.
As one of major problems to be solved upon implementing the AR technique, a problem about registration is known. In order to make the user feel as if virtual objects exist in the physical space, geometrical matching between the virtual objects and physical space needs to be assured. That is, the virtual objects must be observed from the user as if they exist at positions where they should exist in the physical space.
In the AR technique using the video see-through HMD, every time the image of the physical space is acquired from the camera incorporated in the HMD, the position and orientation of the camera in the physical space upon image sensing are measured. A CG is rendered based on the position and orientation of the camera and camera specific parameters such as a focal length and the like, and is superimposed on the image of the physical space. Such series of processes are generally executed in the AR technique. For this reason, in case of the AR technique using the video see-through HMD, the problem about registration is reduced to a problem of measuring the position and orientation of the camera incorporated in the HMD in the physical space. The position and orientation of the camera can be measured by a physical sensor such as a magnetic sensor, ultrasonic sensor, optical sensor, or the like, which measures the six degrees of freedom of the position and orientation of the camera.
When the video see-through HMD is used, image information from the camera incorporated in the HMD can be used for registration. A registration method using image information is popularly used since it is simpler and has lower cost than the method of using the physical sensor. In the registration method using image information, indices whose three-dimensional positions on the physical space are known are sensed by the camera, and the position and orientation of the camera are calculated based on correspondence between the positions of the indices on the sensed image and their three-dimensional positions.
As such indices, markers which are artificially placed on the physical space or natural features such as corner points, edges, and the like which originally exist on the physical space are used. In practice, artificial markers which can be easily detected and identified from an image are prevalently used in terms of stability and a load on calculations.
Non-parent reference 1 discloses a registration method which uses, as indices, square markers in each of which a unique two-dimensional (2D) pattern is drawn. With this method, a region of a square index is extracted from an image, and a marker is identified using the inner 2D pattern. Furthermore, the position and orientation of the camera are calculated based on correspondence between the positions of respective vertices of the square marker on an image sensed by the camera, and the positions of the vertices on a marker coordinate system of the square marker. Artificial markers such as these square markers and the like are popularly used since they can be readily used. However, such markers cannot be used when it is impossible to physically allocate markers or when markers are not wanted to be allocated due to disfigurement or the like.
On the other hand, studies about registration methods using natural features which originally exist on the physical space have been extensively made due to recent improvement of the capability of computers. As natural features used for registration, point-shaped features (to be referred to as point features hereinafter) such as corner points and the like, and line features such as edges and the like are mainly used.
Non-parent references 2, 3, 4, 5, 6, and 7 disclose registration methods using edges. Since edges never change with respect to scales and observation directions, the registration using edges assures high precision as a characteristic feature. The registration using edges is based on the premise of having three-dimensional (3D) model data of a physical space or physical object, which is described by a set of line segments. The registration using edges disclosed in non-patent references 2, 3, and 4 is implemented by the following processes 1 to 3.
1. The aforementioned 3D model data (line segment model) is projected onto an image based on the position and orientation of the camera in a previous frame and camera specific parameters which have already been calibrated.
2. Line segments which configure the projected line segment model are divided at given intervals on the image, thus setting division points. An edge search is conducted on a line segment (search line) which passes through each division point and has a direction normal to the projected line segment to detect, as a corresponding edge, a point which has a local maximum gradient of a luminance value on the search line and is closest to the division point.
3. Correction values of the position and orientation of the camera, which minimize the sum total of distances on the image between the corresponding edges detected for respective division points and projected line segments, are calculated, thereby correcting the position and orientation of the camera.
The aforementioned registration method using edges performs edge detection based on the position and orientation of the camera calculated in the previous frame. Then, correction values for the position and orientation of the previous frame are calculated based on the information of the edge detected on the image, and the position and orientation of the camera in the previous frame are corrected using the correction value, thereby calculating the position and orientation of the camera in the current frame. For this reason, for example, when the position and orientation calculations in the previous frame have failed, the position and orientation of the camera can no longer be correctly calculated in the subsequent frames, thus collapsing the registration. Such situation often occurs when the camera moves at high speed or when a moving object cuts across in front of the camera, and so forth. In order to avoid the registration from collapsing, a method of outputting the positions and orientations of a plurality of cameras in each frame in place of those of a single camera, and using them in the next frame as a plurality of hypotheses, has been proposed.
Non-parent reference 5 avoids the registration using edges from collapsing by using point feature information together and calculating the positions and orientations of a plurality of cameras per frame. Non-parent reference 5 calculates the positions and orientations of the camera in the current frame by iterative operations based on correspondence of point features between frames together with the registration using edges. In this case, the position and orientation of the camera obtained from edge information in the previous frame, and those of the camera obtained from the point feature information are used as initial values, thus calculating the positions and orientations of two types of cameras. The position and orientation which have a larger likelihood of those of the two types of cameras calculated in this way, are determined as those of the camera obtained from the point feature information. Then, the determined position and orientation of the camera are output to the next frame as one hypothesis, and are set as initial values of the position and orientation of the camera in the registration using edges. Furthermore, the aforementioned registration using edges is executed, and the position and orientation of the camera as a result of the registration are output to the next frame as another hypothesis. In this way, non-patent reference 5 always output the positions and orientations of the two cameras to the next frame, and the position and orientation with higher validity are always selected, thus avoiding the registration from collapsing.
Non-patent references 6 and 7 avoid the registration using edges from collapsing by holding the positions and orientations of a plurality of cameras by a particle filter. The particle filter holds the positions and orientations of the plurality of cameras as a set of discrete particles in a six-dimensional space. Each particle has a weight representing the reliability of the position and orientation of the camera as data together with the position and orientation of the camera. In each frame, a new set of particles based on the weights of respective particles are generated from a set of particles obtained from the previous frame. The positions and orientations of the newly generated particles are changed based on a motion model. Furthermore, likelihoods are calculated for respective particles, and a set of particles, which are weighted according to the calculated likelihoods, is output to the next frame as the positions and orientations of the plurality of cameras. As the position and orientation of the camera in the current frame, the weighted averages of the positions and orientations of respective particles are generally used. In this manner, non-patent references 6 and 7 avoid the collapse of registration using edges by holding the positions and orientations of the plurality of cameras as particles.
[Non-Patent Reference 1]
Kato, M. Billinghurst, Asano, and Tachibana, “Augmented Reality System and its Calibration based on Marker Tracking”, the Journal of the Virtual Reality Society of Japan, vol. 4, no. 4, pp. 607-617, 1999.
[Non-Patent Reference 2]
T. Drummond and R. Cipolla, “Real-time visual tracking of complex structures,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 932-946, 2002.
[Non-Patent Reference 3]
A. I. Comport, E. Marchand, and F. Chaumette, “A real-time tracker for markerless augmented reality,” Proc. The 2nd IEEE/ACM International Symposium on Mixed and Augmented Reality (ISMAR03), pp. 36-45, 2003.
[Non-Patent Reference 4]
L. Vacchetti, V. Lepetit, and P. Fua, “Combining edge and texture information for real-time accurate 3D camera tracking,” Proc. The 3rd IEEE/ACM International Symposium on Mixed and Augmented Reality (ISMAR04), pp. 48-57, 2004.
[Non-Patent Reference 5]
E. Rosten and T. Drummond, “Fusing points and lines for high performance tracking,” Proc. The 10th IEEE International Conference on Computer Vision (ICCV'05), pp. 1508-1515, 2005.
[Non-Patent Reference 6]
M. Pupilli and A. Calway, “Real-time camera tracking using known 3D models and a particle filter,” Proc. The 18th International Conference on Pattern Recognition (ICPR'06), pp. 199-203, 2006.
[Non-Patent Reference 7]
G. Klein and D. Murray, “Full-3D edge tracking with a particle filter,” Proc. British Machine Vision Conference 2006, 2006.
[Non-Patent Reference 8]
I. Skrypnyk and D. G. Lowe, “Scene modelling, recognition and tracking with invariant image features,” Proc. The 3rd IEEE/ACM International Symposium on Mixed and Augmented Reality (ISMAR04), pp. 110-119, 2004.
[Non-Patent Reference 9]
W. Wuest, F. Vial, and D. Stricker, “Adaptive line tracking with multiple hypotheses for augmented reality,” Proc. The Fourth Int'l Symp. on Mixed and Augmented Reality (ISMAR05), pp. 62-69, 2005.
[Non-Patent Reference 10]
K. Satoh, S. Uchiyama, H. Yamamoto, and H. Tamura, “Robust vision-based registration utilizing bird's-eye view with user's view,” Proc. The Second Int'l Symp. on Mixed and Augmented Reality (ISMAR03), pp. 46-55, 2003.
A scheme disclosed in non-patent reference 5 generates the positions and orientations of a plurality of cameras using point feature information so as to avoid the registration from collapsing. If there are no or a few point features such as corner points or the like on the physical space, or if there are many point features having similar appearances, the scheme disclosed in non-patent reference 5 cannot be applied. Also, which of two sets of positions and orientations of the cameras are to be used is determined only in the next frame. Hence, upon rendering a virtual object in the AR technique, the correct position and orientation of the camera cannot always be used.
Since the scheme disclosed in non-patent references 6 and 7 requires calculations of likelihoods for several hundreds of particles, the calculation load is heavy. For this reason, this scheme is not suitable for an AR application which requires realtimeness. With the scheme using the particle filter, since the position and orientation of the camera are expressed as a set of particles, the obtained position and orientation of the camera are often inaccurate, and jitter is produced between frames.