1. Field of the Invention
The present invention relates generally to point cloud scans, and in particular, to a method, apparatus, and article of manufacture for automatically and efficiently conducting a global registration of large point cloud scans.
2. Description of the Related Art
(Note: This application references a number of different publications as indicated throughout the specification by reference numbers enclosed in brackets, e.g., [x]. A list of these different publications ordered according to these reference numbers can be found below in the section entitled “References.” Each of these publications is incorporated by reference herein.)
For a few years now, the world of three-dimensional (3D) data faces a true revolution with many new devices easily available for both capturing data and visualizing data whether it be pure virtual 3D data for virtual reality applications or combined 3D and visual data for augmented reality. This new context with everyday new applications based on 3D information, more 3D data of many different types, and new sensors, presents many new challenges. Among these challenges, the first processing step (preceding almost always any other processing from 3D data) is the registration of data acquired by a sensor from different locations. Indeed, whatever the kind of sensors considered (Laser scanner, KINECT, etc.), no existing technology can acquire the whole 3D information of a scene all at once.
Thus, a 3D acquisition usually results in a set of 3D points (a 3D point cloud), with or without associated RGB information depending on the sensor, covering partially the spatial area of the full scene. One problem in the data acquisition process is the registration of multiple scans. Single scans, from different sensor positions, are acquired within a local coordinate frame defined by the instrument. For visualization and further data processing of the point cloud, the single scans must be transformed into a common coordinate frame. This process is termed as “registration.” Given two or more 3D point clouds that have a subset of points in common, the goal of 3D registration is to compute the rigid transformation that aligns these point clouds, providing an estimation of the relative pose between them. This registration of partially overlapping 3D point clouds, usually considered pair by pair, taken from different views is thus an essential task in 3D computer vision and the basis of many algorithms in 3D vision applications.
There are two well identified challenges in this task of 3D registration:                Global registration: refers to the registration of a pair of 3D point clouds without initial guesses on the pose of one point cloud to the other, the pose is thus arbitrary; and        Local registration: refers to the registration of a pair of 3D point clouds from a valid initial estimate of the pose between the two clouds.        
Many algorithms for registration have addressed these challenges in the last decades. It is commonly accepted in the community to classify the algorithms into two (2) distinctive classes [1] based on the challenge they address:                Coarse registration: consists of roughly aligning two scans without any clue about the initial alignment; and        Fine registration: from a given coarse alignment, a fine registration refines the result, generally by minimizing error iteratively.        
The global registration usually requires both coarse and fine registration steps, while for the local registration challenge, it is often sufficient to consider a fine registration process only.
A popular type of approach involves iterative processing referred to as Iterative Closest Point (ICP) [2] and variants [3; 4; 5]. However, in practice, the original ICP methods tend to converge poorly when subjected to severe noise and large pose displacements without a good initial guess on scans alignment. Thus, ICP, or its variants, are often used for a fine registration stage in order to improve previous coarse registration but are not suitable as direct solutions for coarse registration.
Some other approaches try to get rid of any requirement for a good initial estimate by extracting invariant local features and finding correspondences. However, the main limitation of such methods is a lack of robustness to large baselines (large sensor viewpoint changes) and little overlap.
With the availability of several new consumer-grade 3D sensors, many solutions have been proposed to solve this problem and work pretty well for reasonably sized point clouds (few thousands of points) even with little overlap, noise and a pretty large baseline.
One of the main advantage of LiDAR (light imaging, detection and ranging) scans over ordinary point clouds is that they are acquired in a global 3D coordinate system. This way, Euclidean distance can be used to describe patterns in the scene. It considerably reduces the complexity of the problem by locking the scale. Furthermore, the LiDAR scans are almost noiseless.
There are many algorithms trying to solve the registration problem using either only geometrical information, or only visual information (e.g. RGB values attached to the 3D points), and eventually some use both. The main drawback is that most of the current solutions are not able to efficiently solve the global registration problem when all of the following constraints are present: large scale data, large baseline, little overlap, and no initial guess of the relative alignment.
To better understand these problems, a description of the related art may be useful.
A survey of the most significant contributions in the domain can be found in [1]. As described in this survey, all of the approaches of coarse registration can be compared considering the following main steps:
(1) Salient information detectors: The survey [1] presents Normal Space Sampling [3], Maximally Stable Volumes [6], Heat Kernel-based Features [7], MeshDoG [8], Intrinsic Shape Signatures [9], Key-Point Quality [10], and Harris3D [11];
(2) Salient information descriptors: The survey [1] presents Principal Curvature [12], Point Signature [13], Spin Image (Intrinsic Shape Signatures [9] and SHOT [14]), Improved Spin Image [15], Scale-Invariant Spin Image [16], Principal Component Analysis [17; 11; 16], Line-based algorithm [18], 3D Shape Contexts [19], Dynamical Systems [20], Integral Invariants [17], Curve Skeleton [21; 22], Point Feature Histograms [23], MeshHOG [8], Intrinsic Shape Signatures [9], Heat Kernel Signature [7; 24], and Rotational Projection Statistics [25];
(3) Searching strategies: The survey [1] presents Algebraic Surface Model [26], RANSAC-based Methods [27; 28; 29], Robust Global Registration [30], 4-points Congruent Sets [31], and Evolutionary Methods [32]; and
(4) Refinement: This step is mainly based on ICP [2] and its improvements [3; 4; 5]
The survey [1] not only reports the current state-of-the-art on coarse registration but it proposes a pipeline to classify these existing methods with respect to the aforementioned main steps. The survey [1] also defines a standard formal notation to provide a global point of view of the state-of-the-art.
One classic approach to describe the scans around salient detectors is based on normal distribution. This idea has already been taken into account in several approaches. For instance, an approach based on normals distribution has been proposed by Makadia & al. [4]. Normals are accumulated in an orientation histogram. Peaks are identified and create constellations, that are then aligned to directly give the rotation between two scans. Translation can be deduced right afterwards. The limitations identified in this approach are that overlap has to be important, and that normals from the overlap area must be discriminative regarding the distribution of normals through the whole point clouds while being consistent with this distribution.
One solution to address these limitations, when RGB information is available, is to combine 3D and RGB detection, description and matching. For instance, Zeisl et. al. [33] addressed these limitations by combining 3D information with standard vision matching using 2D SIFT. The idea in their paper is to project the scan in perspective views orientated towards “salient directions,” that are identified as peaks on the normals distribution as in the previous work. However, this algorithm is challenged by point clouds with few planar areas, or surfaces presenting an important tilt when viewed from the scan location.
In cases such as scanner acquisitions, there may be lot of self-occlusions, leaving large parts of the scene hidden from the scanner. Since the occlusions are likely to be different from one viewpoint to another, one may utilize a smaller size for the neighborhood. However, using a smaller size for the neighborhood might not be suitable since the information held by such a neighborhood could vary a lot.
Another approach is set forth in Aiger et. al. [31]. Aiger [31] proposed a 4-Points Congruent Sets (4PCS) algorithm that does not use local description. Instead, [31] considers groups of 4 quasi-coplanar points in the point cloud. A shape descriptor is used that is based on the distances between these points and the intersection of the line segments they define. The selected points must be spread enough to guarantee algorithm stability, and close enough to ensure that the points are in the overlap area between the two viewpoints. The geometric configuration of four coplanar points is discriminant enough to create strong matches between the two point clouds. Aiger's work on the 4PCS algorithm inspired several other approaches still introducing the 4-Points Congruent Sets principle: the Super4PCS [34] is based on a smart indexing data organization reducing the algorithm complexity from quadratic to linear in the number or data points; Mohamad et. al. [35] generalizes 4PCS by allowing the two pairs to fall on two different planes that have an arbitrary distance between them, then by increasing this distance, the search space of matching hypotheses exponentially leads to a maximum run-time improvement of 83:10%.
However the Super4PCS is more complex to implement and the run-time improvement is almost always dividing 4PCS run-time by two. Similarly the Generalized 4PCS reach more than 80% of run-time improvement but this is only evaluated on datasets with less than 100K points.
Thus another idea to reduce the complexity of 4PCS could be to not anymore rely on 4-Points Congruent Sets rather relying on two points.
Indeed, the idea of a four component descriptor based on angles and distance between two points, has been first proposed by Drost et al. [36].
Drost [36] applied this descriptor, called Point Pair Feature (PPF), for efficient and robust 3D object recognition.
Another prior art approach is that of T. Birdal and S. Ilic [37] which proposes a full processing chain based on their PPF descriptor to address the problem of large scene scan registration.
In view of the above, what is needed is a global registration method that can be efficiently used with large scale data, large baseline, little overlap, and no/little pose information/relative assignment information.