There are many 3D reconstruction methods that use red, green, and blue (RGB) images and conventional structure-from-motion procedures to reconstruct 3D scenes. Generally, RGB image-based 3D reconstruction techniques have difficulties dealing with textureless regions, e.g., walls and ceilings. For reconstructing textureless regions, active sensors can be used. For example, the Kinect™ sensor for the Microsoft Xbox™ uses an infrared (IR) pattern for acquiring 3D data as a depth map (point cloud). The Kinect™ sensor is also equipped with a 2D camera to capture an RGB image. The depth map can be registered to the RGB image to generate a red, green, blue, and depth (RGBD) image.
In general, 3D sensors like Kinect™ can be used to reconstruct large scenes, see e.g., Izadi et al., “KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera,” ACM Symposium on User Interface Software and Technology, October 2011.
However, the field of view (FOV) of such sensors can sometimes lead to several challenges, especially for scenes with corners and turns, where there may not be enough feature correspondences to obtain a good registration.
Some methods use already reconstructed RGB image-based models to estimate the pose of RGB cameras as in Snavely et al., “Photo Tourism: Exploring Photo Collections in 3D,” ACM Transactions on Graphics, 2006.