3D Reconstruction
Interactive and real-time 3D reconstruction of a scene is used in a number of applications, e.g., robotics, augmented reality, medical imaging, and computer vision. Real-time sparse and dense 3D reconstruction can use passive sensors such as cameras. However, passive sensors have difficulties to reconstruct textureless regions.
For reconstructing textureless regions, active 3D sensors can be used. For example, the Kinect sensor for the Microsoft Xbox uses an IR pattern for acquiring 3D data as a depth map from a viewpoint of the sensor in real time.
Other issues relate to processing time, memory requirement, and accuracy. Because of the field of view and resolution, 3D sensors usually produce a partial reconstruction of a scene. It is desired to provide an accurate and fast registration method that can combine successive partial depth maps and a model of the scene. Inertial sensors are prone to drift. Therefore, the features in an RGB (texture) image or depth map need to be relied on for accurate registration. In addition, depth maps are usually noisy without any higher-level spatial constraint. Furthermore, the point cloud requires a very large memory, and is difficult to compress.
3D-to-3D Registration
Local
Alignment or registration of 3D data is a fundamental problem in computer vision applications, which can be solved using several methods. The registration methods can be local or global. The local methods should start with a good initialization, and register two 3D point clouds using relatively small iterative moves. This is similar to a non-linear minimization method that converges to a global solution with a good initial solution. The most common local method is an iterative closest point (ICP) method, which iteratively determines corresponding 3D points and the moves using a closed-form solution.
Global
Global methods typically consider the entire 3D point cloud, identify some key geometric features (primitives), match the features across point clouds, and generate an optimal hypothesis using a minimal set of correspondences using a RANdom SAmple Consensus (RANSAC) procedure. The coarse registration obtained by global methods is usually followed by local non-linear refinement. Global methods, unlike local methods, do not require initialization. However, global methods can suffer from incorrect and insufficient correspondences. The geometric primitives typically used in global methods are either points, lines, or planes.
Several global registration methods using homogeneous and heterogenous correspondences are known. For example, it is possible to determine a closed-form solution for the registration given point-to-point, line-to-line, plane-to-plane, point-to-line, point-to-plane, or line-to-plane correspondences. One method obtains a global optimal solution from point-to-point, point-to-line, and point-to-plane correspondences using branch-and-bound. Another method uses branch-and-bound to obtain the optimal correspondences as well as transformation for the point-to-point registration problem.
SLAM Using 3D Sensors
In mobile robotics, some 3D-sensor-based methods use a simultaneous localization and mapping (SLAM) system for determining a motion of the sensor as well as reconstructing a scene structure. Those methods typically use geometric features such as point, line, or plane primitives. 3D sensors that provide a planar slice of 3D data, such as 2D laser scanners or ultrasonic sensors, can be used for determining planar, three degrees-of-freedom (DOF) motion. 3D sensors that provide full 3D point clouds, such as structured light scanners, 2D laser scanners attached on moving stages, and the Kinect sensor can be used for determining six DOF motion.
RGB-D mapping extracts keypoints from RGB images, back-projects the points in 3D using depth maps, and uses three point-to-point correspondences to determine an initial estimate of the pose using the RANSAC procedure, which is further refined using the ICP method.
Another method uses three plane-to-plane correspondences in a SLAM system with 3D sensors. That method addresses the correspondence problem using geometric constraints between planes.
Another method uses a combination of smaller field-of-view (FOV) 3D sensor and a larger FOV 2D laser scanner for the SLAM system using both planes and line segments as primitives. That method is designed for a sequential SLAM system that solves a local registration problem, and cannot solve global registration.
KinectFusion registers a current depth map with a virtual depth map generated from a global truncated signed distance function (TSDF) representation by using a coarse-to-fine ICP method. The TSDF representation integrates all previous depth maps registered into a global coordinate system, and enables higher-quality depth map generation than using a single image.
Several other variants of the ICP method are known, but the variants still suffer from local minima issues when the two 3D point clouds are distinct. Registration methods or SLAM systems that solely depend on points suffer from insufficient or incorrect correspondences in textureless regions or regions with repeated patterns. Plane-based techniques suffer from degeneracy issues in scenes containing insufficient number of non-parallel planes.
With 3D sensors such as the Kinect sensor, line correspondences are difficult to obtain because of noisy or missing depth values around depth discontinuities.