In the fields of computer vision and robotics, use is made of the technique for estimating the camera pose on the basis of the images captured by the camera. The technique is applied, for example, to the localization of an autonomous mobile robot, a navigation system, an AR (augmented reality) technology.
To be more specific, researches have been made of SLAM (Simultaneous Localization and Mapping) and SfM (Structure from Motion) as a technique for simultaneously estimating the camera pose and the three-dimensional structure of ambient objects to be photographed.
A monocular camera or a stereo camera may be used for the pose estimation. In particular, if the SLAM is performed using the stereo camera, the absolute scale of the three-dimensional structure of the surroundings of the stereo camera can be estimated.
In the SLAM using the stereo camera, the three-dimensional point of a feature point is restored based on the stereo image captured at a certain point of time (t), and the pose of the stereo camera at another point of time (t+1) is estimated in such a manner as to minimize the re-projection error of the case where the three-dimensional point is projected in the stereo camera at point of time (t+1).