1. Field of the Invention
The present invention relates to a technique for calculating position and orientation information of a viewpoint of an image using the image.
2. Description of the Related Art
In recent years, the studies of information presentation techniques called Mixed Reality techniques (to be referred to as MR techniques hereinafter) have been extensively made. Of the MR techniques, the studies of an Augmented Reality technique (to be referred to as an AR technique hereinafter) which superimposes and displays information of a virtual space on a physical space have been especially extensively made. As a typical information presentation apparatus based on the AR technique, a video see-through Head Mounted Display (to be abbreviated as HMD hereinafter) is known. The video see-through HMD incorporates a camera for sensing an image of the physical space. On the image of the physical space sensed by this camera, a virtual object, which is generated by computer graphics (to be abbreviated as CG hereinafter) in accordance with the position and orientation of the camera, is superimposed and rendered, and is displayed on a display device such as a liquid crystal panel of the HMD. With the AR technique, the user can feel as if the virtual object were actually existing on the physical space.
As one serious problem to be solved upon implementation of the AR technique, a registration problem is known. In order to make the user feel as if a virtual object actually existed on the physical space, geometrical consistency has to be ensured between the virtual object and physical space. That is, the virtual object has to always be observed from the user so as to exist at a position where it should exist on the physical space.
In the AR using the video see-through HMD, every time an image is input from the camera incorporated in the HMD, the position and orientation of the camera upon sensing that image on the physical space are measured. Then, processing for rendering a CG based on these position and orientation of the camera, and specific parameters of the camera such as a focal length, and superimposing that CG on the image of the physical space is generally executed. For this reason, in case of the AR using the video see-through HMD, the registration problem is that posed upon measuring the position and orientation of the camera incorporated in the HMD on the physical space. The position and orientation of the camera can be measured by a six-degrees-of-freedom physical sensor that measures the position and orientation of the camera such as a magnetic sensor, ultrasonic sensor, or optical sensor.
When the video see-through HMD is used, image information from the camera incorporated in the video see-through HMD can be used for registration. The registration method using image information is popularly used since it requires simpler processes and lower cost than the method using the physical sensor. In the registration method using image information, it is a common practice to sense, using the camera, an image of an index which has a given three-dimensional (3D) position on the physical space, and to calculate the position and orientation of the camera based on correspondence between the position of the index on the sensed image and its 3D position. As the index, markers which are artificially laid out on the physical space or natural features such as corner points and edges which originally exist on the physical space may be used. In practice, artificial markers which are easily detected and identified from an image are prevalently used since they require a light calculation load and are suited to real-time processing.
Non-patent reference 1 discloses a registration method using a marker having a square shape (square marker) including a unique two-dimensional (2D) pattern as an index. With this method, a region of the square marker is extracted from an image, and the square marker is identified using the inner 2D pattern. Furthermore, the position and orientation of the camera are calculated based on correspondence between the positions of vertices of the square marker on an image sensed by the camera, and those of the vertices of the square marker on a marker coordinate system.
An artificial marker such as the square marker is widely used since it can be readily used. However, such artificial marker cannot be used when it is physically impossible to lay out the marker or when the marker does not want to be laid out for the reason for spoiling the beauty.
Along with the recent improvement of computer performance, the studies of a registration method using natural features have been extensively made. As natural features used in registration, a point feature such as a corner point and a line feature such as an edge are mainly used.
Non-patent reference 2 discloses a method which sequentially estimates the position and orientation of the camera by detecting point features from an image by a Harris detector, and tracking the point features which exist on a single plane on the physical space between images. Non-patent reference 3 uses point features called SIFT (Scale Invariant Feature Transform) features, which are rotation invariant, scale invariant, and has high identifiability. That is, a database of the 3D positions of SIFT features on the physical space is generated in advance, and point features are identified by matching between SIFT features detected from an image sensed by the camera, and those on the database, thus calculating the position and orientation of the camera.
In general, a point feature has an advantage of high identifiability since it is expressed by information of a feature position and its surrounding pixels. However, the point feature has low detection precision and poor registration precision, since its appearance changes largely depending on the observation direction. Since the calculation load of image processing for feature detection (for example, template matching or detection of SIFT features) is heavy, some ingenuity is required to attain real-time processing.
As a registration method that uses natural features and allows real-time processing, many studies about registration using edges (to be referred to as edge-based registration hereinafter) have been made (for example, see non-patent references 4, 5, and 6). Unlike a point feature, since an edge is invariant for a scale and observation direction, the edge-based registration has a feature of high registration precision. The edge-based registration method described in each of non-patent references 4, 5, and 6 is premised on having 3D model data of the physical space and a physical object. Each 3D model data is described by a set of line segments. The registration using an edge is generally implemented by (process 1) to (process 3) as follows.
(Process 1) The aforementioned 3D model data (line segment models) are projected onto an image based on the predicted values of the position and orientation of the camera (for example, the position and orientation of the camera in a previous frame) and calibrated specific parameters of the camera.
(Process 2) Each of the projected line segments is divided at given intervals on the image. An edge search is conducted for respective divided points in the normal direction to each line segment. Normally, a point with a maximal luminance value gradient on a search line is detected as an edge.
(Process 3) The position and orientation of the camera are repetitively corrected by a nonlinear optimization calculation to minimize the sum total of distances on the image between the edges detected for respective divided points and the projected line segments.
Unlike a point feature, an edge has low identifiability on an image. In the edge search, since only information indicating a maximal luminance value gradient on a search line is used, a wrong edge is often detected. Non-patent references 4 and 5 use a method called M-estimation to prevent erroneously detected edges from adversely influencing optimization calculations, and make the optimization calculations to set small weights for edge data which are estimated as erroneously detected edges.
In non-patent reference 5, upon conducting the edge search in the normal direction of each projected line segment, pixel values are convoluted using a kernel that allows to strongly detect an edge having the same direction as the projected line segment, thereby preventing detection of an edge having a different direction. Furthermore, the calculation results of the convolution calculations are used as weights in the optimization calculations, and a large weight is assigned to data of an edge having a direction similar to the projected line segment, thus reducing the influence of erroneously detected edges.
Furthermore, in non-patent reference 6, upon conducting the edge search in the normal direction of each projected line segment, a plurality of edges are detected in place of one edge, and are held as candidates. In the optimization calculations, a candidate edge closest to the projected line segment is used in respective steps of repetition, thus reducing the influence of detection errors.
[Non-Patent Reference 1]
Kato, M. Billinghurst, Asano, and Tachibana, “An Augmented Reality System and its Calibration based on Marker Tracking”, The Transactions of VRSJ, vol. 4, no. 4, pp. 607-617, 1999.
[Non-Patent Reference 2]
G. Simon, A. W. Fitzgibbon, and A. Zisserman, “Markerless tracking using planar structures in the scene,” Proc. The 1st IEEE/ACM International Symposium on Augmented Reality (ISAR2000), pp. 120-128, 2000.
[Non-Patent Reference 3]
I. Skrypnyk and D. G. Lowe, “Scene modeling, recognition and tracking with invariant image features,” Proc. The 3rd IEEE/ACM International Symposium on Mixed and Augmented Reality (ISMAR04), pp. 110-119, 2004.
[Non-Patent Reference 4]
T. Drummond and R. Cipolla, “Real-time visual tracking of complex structures,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 932-946, 2002.
[Non-Patent Reference 5]
A. I. Comport, E. Marchand, and F. Chaumette, “A real-time tracker for markerless augmented reality,” Proc. The 2nd IEEE/ACM International Symposium on Mixed and Augmented Reality (ISMAR03), pp. 36-45, 2003.
[Non-Patent Reference 6]
L. Vacchetti, V. Lepetit, and P. Fua, “Combining edge and texture information for real-time accurate 3D camera tracking,” Proc. The 3rd IEEE/ACM International Symposium on Mixed and Augmented Reality (ISMAR04), pp. 48-57, 2004.
[Non-Patent Reference 7]
H. Wuest, F. Vial, and D. Stricker, “Adaptive line tracking with multiple hypotheses for augmented reality,” Proc. The Fourth Int'l Symp. on Mixed and Augmented Reality (ISMAR05), pp. 62-69, 2005.
[Non-Patent Reference 8]
K. Satoh, S. Uchiyama, H. Yamamoto, and H. Tamura, “Robust vision-based registration utilizing bird's-eye view with user's view,” Proc. The Second Int'l Symp. on Mixed and Augmented Reality (ISMAR03), pp. 46-55, 2003.
[Non-Patent Reference 9]
M. Isard and A. Blake, “CONDENSATION—conditional density propagation for visual tracking,” International Journal of Computer Vision, vol. 29, no. 1, pp. 5-28, 1998.
[Non-Patent Reference 10]
M. Pupilli and A. Calway, “Real-time camera tracking using a particle filter,” Proc. The 16th British Machine Vision Conference 2005 (BMVC2005), pp. 519-528, 2005.
In the conventional edge-based registration method, in order to solve the problem about detection errors of edges, weights are given to data for respective line segment divided points in accordance with the distances between the edge detection positions and projected line segments using the M-estimation to attain nonlinear optimization, thus eliminating the influence of detection errors.
However, when, for example, an object having a repetitive pattern (a shelf, window, or the like) exists on the physical space, and a 3D model of this object is used in edge-based registration, and when a repetitive pattern part largely occupies the field of view, the aforementioned method poses a problem. Because, in such case, since line segments that match the repetitive pattern have a majority, even when edges corresponding to the repetitive pattern are erroneously detected, they are not determined as detection errors. For this reason, even when the aforementioned M-estimation is used, the obtained position and orientation of the camera are not optimal solutions although they are local minimal solutions, thus lowering registration precision.