There are methods of obtaining a position and pose of a camera mounted on a personal computer (PC), a mobile terminal, and other devices. Furthermore, techniques are known that use the obtained position and pose of the camera to cause additional information to be superimposed on a captured image displayed on a screen of the PC or the mobile terminal in order to realize user work support.
As a conventional technique of estimating a position and pose of a camera from a screen from moment to moment, a method that uses a feature point included in an image is available, for example. With the conventional technique using a feature point, a three-dimensional coordinate map with respect to an object is created in advance, and a feature point present in the current image is associated with a group of points in the map every frame, whereby the position and pose of the camera is estimated.
FIG. 13 is a diagram explaining a conventional technique for obtaining a position and pose of a camera. In the example illustrated in FIG. 13, map points S1 to S6 are present. A map point Si is presented by expression (1). On an image 20, feature points x1 to x6 are present. A feature point xi is presented by expression (2) in a camera coordinate system. Map points projected on the captured image 20 are projected points x1′ to x6′. A projected point xi′ is presented by expression (3) in a camera coordinate system.Si=(x,y,z)  (1)xi=(u,v)  (2)xi′=(u′,v′)  (3)
For example, with a conventional technique, a camera position and pose matrix M is calculated such that a sum of squares E calculated by expression (4) is the smallest, whereby a position and pose of a camera is obtained.
                    E        =                              ∑            p                    ⁢                                          ⁢                                                                                    x                  p                  ′                                -                                  x                  p                                                                    2                                              (        4        )            
At this time, when the user is performing an operation, the position and pose of the camera are frequently changed, and estimation of the position and pose of the camera is temporarily lost in some cases. For example, when the user turns the mobile terminal downward, the camera mounted on the mobile terminal also faces down. With this, feature points included in the object are not detected in the captured image of the camera, whereby detection of the position and pose of the camera is temporarily disabled.
When the camera is directed to the object again from the state in which the position and pose of the camera is not detected, processing for restarting the camera position and pose estimation processing is performed. This processing is referred to as relocalization processing. For the relocalization processing, a plurality of techniques are available. For example, the relocalization processing includes a technique using an image-to-image method and a technique using an image-to-map method. Furthermore, a technique that determines an pose change of an imaging device is also available (see Japanese Laid-open Patent Publication No. 2011-130180).
The image-to-image method will be described. The image-to-image method uses a keyframe in the relocalization processing. A keyframe is a piece of information in which a camera position and pose value is associated with a captured image of the camera at that time. By using a three-dimensional map acquired in advance, the user accumulates keyframes during the camera position and pose estimation. When the position and pose of the camera is lost, the image-to-image method searches for keyframes being the most similar to the current captured image of the camera, estimates a relative position and pose between the searched keyframes and the current camera, and thereby obtains the current position and pose of the camera. With the image-to-image method, the relocalization processing is performed using two images, as described above.
An advantage of the image-to-image method will be described. The image-to-image method enables high-speed relocalization processing.
The image-to-map method will be described. With the image-to-map method, a local feature descriptor for each feature point is used for relocalization processing. With the image-to-map method, feature points within the current captured image of the camera are associated with map points through matching of the local feature descriptors. With the image-to-map method, if a corresponding pair of three or more feature points and map points is able to be found, the current position and pose of the camera may be estimated by a perspective N-point (PnP) algorithm. The image-to-map method performs the relocalization processing by associating feature points within the captured image with map points as described above.
An advantage of the image-to-map method will be described. The image-to-map method enables estimation of the position and pose with less keyframes compared with the image-to-image method.