There is a technology of calculating a position and attitude of a camera with respect to a captured image, based on the captured image from the camera mounted on a personal computer (PC), a mobile terminal, or the like. Further, there is an augmented reality (AR) technology of achieving support of user's operation by using a position and attitude of a camera to superimpose and display additional information such as computer graphics (CG) on a captured image displayed on a screen of a PC, a mobile terminal, or the like.
FIG. 11 is a diagram illustrating an example of the AR technology. As illustrated in FIG. 11, for example, when a user uses a camera incorporated in a mobile terminal 10 to capture a marker 11 and an inspecting object 12, object information 13 of the marker 11 is displayed on a display screen 10a of the mobile terminal 10.
As a technology of calculating a position and attitude of a camera, there is a first conventional art of using a feature point contained in a captured image. A feature point is detected based on the fact that an intensity variation is larger near a target point and a position of the target point on an image is uniquely defined by the intensity variation. In the first conventional art, a three-dimensional coordinate set of feature points that have been generated in advance is used. In the following description, three-dimensional coordinates of a feature point is referred to as a map point, and a set of map points is referred to as an initial map. For example, in the first conventional art, a position and attitude of camera is estimated by associating feature points present in a current captured image with map points projected on the captured image.
FIG. 12 is a diagram for illustrating the first conventional art that calculates a position and attitude of a camera. In the example of FIG. 12, there are map points S1 to S6. A particular map point Si is expressed by Formula (1) in the global coordinate system. There are assumed to be feature points x1 to x6 on a capture image 20. A particular feature point xi is expressed by Formula (2) in a camera coordinate system. Map points projected on the capture image 20 are designated as projection points p1′ to p6′. A particular projection point is expressed by Formula (3) in the camera coordinate system.Si=(x,y,z)  (1)xi=(u,v)  (2)pi′=(u′,v′)  (3)
For example, in the first conventional art, a position and attitude of a camera is found by calculating a camera position and attitude matrix M whose sum of squares E calculated by Formula (4) is the least.
                    E        =                              ∑            i                                                          ⁢                                                                                    P                  i                                -                                  x                  i                                                                    2                                              (        4        )            
Next, the first conventional art of generating an initial map will be described. FIG. 13 is a diagram for illustrating the first conventional art of generating an initial map. For example, in the first conventional art, the principle of stereo capturing is used. The first conventional art associates the same feature points with each other one after another between two captured images taken by changing the capturing position. Based on a positional relationship of a plurality of the associated corresponding points in each captured image, the first conventional art generates an initial map in which the corresponding points form map points.
In the example illustrated in FIG. 13, a recovered map point is represented as Si, and a point at which a line segment between an initial capturing position Ca of a camera and the map point Si, intersects with a first captured image 20a is represented as a feature point xai. A point at which a line segment between a second capturing position Cb of the camera and the map point Si intersects with a second captured image 20b is represented as a feature point xbi. Then, the resultant corresponding points will be the feature point xai and the feature point xbi.
A camera position and a capturing orientation of the first captured image are generally used as the origin of three-dimensional coordinates for an initial map. FIG. 14 is a diagram illustrating an example of a definition of a capturing orientation of a camera. As illustrated in FIG. 14, for example, the origin of a three-dimensional coordinate system of an initial map is defined based on a position (Tx, Ty, Tz) and a capturing orientation (Rx, Ry, Rz) of a camera C10 as a reference.
Here, in the first conventional art that uses an initial map to calculate a camera position and attitude, first to fourth optimum conditions described below apply, in principle. The first optimum condition provides that a higher accuracy of three-dimensional coordinates of map points improves the estimation accuracy of a camera position and attitude. Thus, two captured images used for stereo capturing may often be retaken for many times.
The second optimum condition provides that the closer to a camera a feature point that matches a map point is, the more the estimation accuracy of a camera position is improved. This is because a feature point of the captured object which is closer to a camera allows for a higher relative resolution of the camera and therefore the accuracy of positions of feature points in a captured image is improved.
The third optimum condition provides that more map points improves the estimation accuracy of a camera position and attitude. The fourth optimum condition provides that a wider expansion in a distribution of map points improves the estimation accuracy of a camera position and attitude.
In principle, with five or eight map points which match feature points, a camera position and attitude can be estimated and therefore the conventional art uses map points as many as possible by prioritizing the third and fourth optimum conditions without taking the second optimum condition into account. For example, the art related to attitude estimation of a camera is disclosed in Japanese Laid-open Patent Publication No. 2012-68772.
On the other hand, 3D distance sensors that can acquire three-dimensional coordinates of an object in real time have been prevalent. In particular, by setting both a 3D distance sensor and a camera and making correction of a positional relationship to each other, three-dimensional coordinates of each feature point of a captured image of the camera can be calculated in real time. With the use of the 3D distance sensor, three-dimensional coordinates of a map point can be defined at the time of detection of a feature point in the first captured image when generating an initial map and, furthermore, the position accuracy of a map point is higher than the position accuracy of a map point resulted from stereo capturing. For example, the related art is disclosed in Japanese Laid-open Patent Publication No. 2000-293687.