3D reconstruction is one of the most sought-after topics in 3D computer vision, which has a wide variety of applications in mapping, robotics, virtual reality, augmented reality, architecture, game, film making, and etc. A 3D reconstruction system can take images, in RGB (red-green-blue), RGBD (red-green-blue-depth), or depth-only format as input and generate a 3D representation, e.g., 3D meshes, of the images. Among processing procedures of the 3D reconstruction system, one of the critical components is pose estimation: recovering each camera pose associated with each input image. The camera pose may include a focal length, a position, and/or a rotation direction and angle of the camera.
Most recently, with the availability of low-cost RGBD sensors, such as Kinect, Google Tango, and Intel Realsense, RGBD images can be readily captured with such available devices and be used for 3D reconstruction.
For the purpose of reconstructing high-quality 3D meshes, however, the accuracy requirement is extremely high. The camera poses should be both globally and locally consistent. Present technologies, however, are not able to provide a robust and accurate end-to-end framework solution for pose estimation of RGBD images for large-scale scenes.