Simultaneous localization and mapping (SLAM) is used for many applications in computer vision, robotics, and augmented reality. An essential part of SLAM system is sensor localization. This involves registration of primitives in a sensor coordinate system with a SLAM map coordinate system. SLAM using monocular sensors is well known. With the prevalence of low-cost 3D sensors, many SLAM systems make use of both color and depth data using red, green, blue, and depth (RGB-D) sensors, such as a Microsoft Kinect® sensor.
Although RGB-D sensors can increase registration accuracy, such sensors typically provide accurate depths only in a limited range, e.g., 0.5 m to 4 m for Kinect®, due to hardware limitations and noise. Most RGB-D SLAM systems only use pixels with valid depths, while ignoring pixels associated with scene points that are too close or too far from the sensor. This is an ineffective use of information provided by the sensors and might introduce registration inaccuracy, especially for scenes with large depth variations.
Most conventional SLAM systems use a single type of primitive, e.g., either 2D points or 3D points, as measurement. For example, feature-based monocular SLAM systems extract 2D point features, use 2D-to-2D point correspondences to initialize 3D point landmarks by triangulation, and then use 2D-to-3D correspondences between the 2D point measurements and the 3D point landmarks to estimate the sensor pose in consecutive images, see Davison et al., “MonoSLAM: Real-time single camera SLAM,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 6, pp. 1052-1067, June 2007, and Klein and Murray, “Parallel tracking and mapping for small AR workspaces,” Proc. IEEE Int'l Symp. Mixed and Augmented Reality (ISMAR), pp. 1-10, November 2007.
Feature-based RGB-D SLAM systems extract 3D point features and estimate the sensor pose using 3D-to-3D point correspondences. Plane features can also be used as measurements in some SLAM systems. Recent dense SLAM systems, both monocular and RGB-D, do not rely on feature extraction but rather exploit all the 2D or 3D points in images and minimize photometric errors or iterative closest point (ICP) costs for direct registration. However, those systems still only use a single type of primitive, either 2D points or 3D points.
Some SLAM systems use a hybrid of 3D measurements. One system uses both plane-to-plane and line-to-plane correspondences, see Trevor et al., “Planar surface SLAM with 3D and 2D sensors,” Proc. IEEE Int'l Conf. Robotics and Automation (ICRA), pp. 3041-3048, May 2012. Another system uses both point-to-point and plane-to-plane correspondences, see Taguchi et al., “Point-plane SLAM for hand-held 3D sensors,” Proc. IEEE Int'l Conf. Robotics and Automation (ICRA), pp. 5182-5189, May 2013. However, all the measurements used in their systems are 3D primitives.
Some systems address the problem of the lack of depth measurements in some parts of RGB-D images and used both 2D and 3D measurements, see Hu et al., “A robust RGB-D SLAM algorithm,” Proc. IEEE/RSJ Int'l Conf. Intelligent Robots and Systems (IROS), October 2012, and Zhang et al., “Real-time depth enhanced monocular odometry,” Proc. IEEE/RSJ Int'l Conf. Intelligent Robots and Systems (IROS), September 2014. Hu et al. heuristically switches between 2D-to-2D and 3D-to-3D correspondences according to available depths, and thus they do not use both correspondences in a single registration. Zhang et al. use both 2D and 3D measurements to register two images for visual odometry, but the 3D measurements are assumed only in one of the two images, and thus 2D-to-3D correspondences are used.