RGB-D SLAM (Simultaneous Localization and Mapping) systems are employed to build up a map from within an unknown environment (without a priori knowledge) using a camera (sensor) which not alone provides image (RGB colour) but also depth (D) information from the environment. Of course, such techniques are not limited to the RGB colour format and are equally applicable to colour formats including YUV, YCC, LAB etc or indeed monochrome or intensity only formats.
In the present specification, a map is any 3-Dimensional representation of the environment including but not limited to a dense 3D point cloud or a mesh based representation of the environment.
The term pose is used in the present specification to refer to the position and orientation (i.e. viewing direction) of the sensor at a given point in time. During the mapping process, the sensor moves through the environment thereby generating a sequence of poses over time. This sequence of poses is referred to as a pose graph, where vertices in the graph represent the poses; and edges in the graph represent the adjacency relationships (and other constraints) between poses.
Locally consistent mapping refers to mapping where the output geometry is locally accurate (e.g. at the scale of ˜10 meters) however at a global scale, the geometry can exhibit significant error, referred to as drift, as a result of the accumulation of small scale errors in the local mapping process.
It is an object of the present invention to overcome problems in producing globally consistent maps; that is to say maps which are globally consistent in terms of all measurements made of the area being mapped.
P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments,” Int. Journal of Robotics Research, 2012 disclose an approach to RGB-D SLAM using visual feature matching in conjunction with Generalised Iterative Closest Point (GICP) to build a pose graph and subsequently build an optimised surfel map of the environment. Working in an offline manner, they contrast the use of pose graph optimisation versus sparse bundle adjustment (SBA) to minimise feature reprojection errors in a strictly rigid transformation framework.
A rigid transformation is typically used in most SLAM systems, whereby only point translation, rotation and projection are used. A non-rigid transformation includes point stretching, shearing, contraction or twisting. Non-rigid transformations are typically more expensive to compute than rigid ones, and sometimes more difficult to parameterise.
A. S. Huang, A. Bachrach, P. Henry, M. Krainin, D. Maturana, D. Fox, and N. Roy, “Visual odometry and mapping for autonomous flight using an RGB-D camera,” in Int. Symposium on Robotics Research (ISRR), (Flagstaff, Ariz., USA), August 2011 computes a map by SBA as a post-processing step, by minimising rigid reprojection errors.
G. Hu, S. Huang, L. Zhao, A. Alempijevic, and G. Dissanayake, “A robust RGB-D SLAM algorithm,” in Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pp. 1714-1719, October 2012; and D. Lee, H. Kim, and H. Myung, “GPU-based real-time RGB-D 3D SLAM,” in Ubiquitous Robots and Ambient Intelligence (URAI), 2012 9th International Conference on, pp. 46-48, November 2012 disclose attempts to minimise rigid reprojection error for map correction after optimisation.
F. Endres, J. Hess, N. Engelhard, J. Sturm, D. Cremers, and W. Burgard, “An evaluation of the RGB-D SLAM system,” in Proc. of the IEEE Int. Conf. on Robotics and Automation (ICRA), (St. Paul, Mass., USA), May 2012 discloses using visual features for camera pose estimation to achieve global consistency using pose graph optimisation. The map is represented by probabilistically reprojecting all point measurements into an octree-based volumetric map, provided by the OctoMap framework, disclosed in A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and W. Bur gard, “OctoMap: An efficient probabilistic 3D mapping framework based on octrees,” Autonomous Robots, 2013. OctoMap has the advantage of taking measurement uncertainty into account, being space efficient and implicitly representing free and occupied space. However like most voxel representations, integration of measurements (by raycasting) and non-rigid transformations are computationally expensive to perform.
K. Pirker, M. Ruether, G. Schweighofer, and H. Bischof, “GPSlam: Marrying sparse geometric and dense probabilistic visual mapping,” in Proc. of the British Machine Vision Conf., pp. 115.1-115.12, 2011 uses sparse visual features in combination with a dense volumetric occupancy grid for the modeling of large environments. Sliding window bundle adjustment is used with visual place recognition in a pose graph optimisation framework. Upon loop closure the occupancy grid is “morphed” into a globally consistent grid using a weighted average of the log-odds perceptions of each camera for each voxel.
C. Audras, A. I. Comport, M. Meilland, and P. Rives, “Real-time dense RGB-D localisation and mapping,” in Australian Conf. on Robotics and Automation, (Monash University, Australia), December 2011 discloses estimating a warping function using both geometric and photometric information for pose estimation but do not make use of a pose graph. Audras et al also rely on rigid reprojection to produce a 3D map reconstruction.
J. Stueckler and S. Behnke, “Integrating depth and color cues for dense multi-resolution scene mapping using RGB-D Cameras,” in Proc. of the IEEE Int. Conf. on Multisensor Fusion and Information Integration (MFI), (Hamburg, Germany), September 2012 disclose an octree-based multi-resolution surfel map representation which registers surfel maps for pose estimation and relies on pose graph optimisation for global consistency. A globally consistent map is computed by fusing key views after graph optimisation has completed.
Many of the above techniques are capable of producing globally consistent maps, however they are either unable to operate in real- or near real-time, to efficiently incorporate large non-rigid updates to the map or to provide an up-to-date optimised representation of the map at runtime i.e. while map data is being acquired.