It is desirable to be able to recover geometric information of a scene from images taken from several viewpoints. In the prior art, the methods for recovering the geometric information of a scene require a device to record several hundred viewpoints in sequence, as described in Reference [3] below, which is incorporated herein by reference. Another prior art method requires precisely calibrated environments with the motion of the recording device or object structured, as described in References [4] and [7] below, which are incorporated herein by reference. Another prior art method relies on a coarse global model, as described in Reference [11] below, which is incorporated herein by reference.
The prior art method described by Reference [4] uses an early point-to-plane variant of the Iterative Closest Point algorithm in which one viewpoint is selected as the global frame of reference, and for all other viewpoints, points are transformed from a local reference frame into the frames of reference of neighboring views, correspondences are calculated, and a correction to the global location is calculated. This prior art, however, registers range data only from well controlled and calibrated scene-objects placed on a turntable and range data is taken at known and precise angles. What is needed is to be able to gather range data in less controlled and more flexible environments.
Pair-wise multi-view methods attempt to use constraints computed via traditional Iterative Closest Point algorithms to inform a global registration. The method described in Reference [3], which is incorporated herein by reference, iteratively registers a new view to a point cloud and then inserts the transformed points into the cloud; however this method suffers from an accumulation error.
Similarly, KinectFusion, as described in Reference [6] below, which is incorporated herein by reference, constructs a global voxel representation of the scene and iteratively integrates new sensor readings. This requires a high frame rate of sensor information (15-30 hz), slow movement of the sensor, and a priori restrictions on the resolution and scene volume. Another method, as described in Reference [9] below, which is incorporated herein by reference, assumes that pair-wise Iterative Closest Point provides the best transform between two views. Pair-wise Iterative Closest Point requires substantial overlap between two views, and this method cannot deal with large holes in point clouds. Furthermore, the calculated relative transform between pair-wise registered views can be biased towards large sections of points outside the viewing cone of one of the views.
Reference [11] below, which is incorporated herein by reference, describes triangulating the range data by connecting adjacent points. This method requires that all points be ordered in a two dimensional (2D) depth array; thus, holes in the 3D point cloud result in spurious boundary points.
In the survey paper of Reference [10] below, which is incorporated herein by reference, there is no multi-view registration method utilizing Kd-trees to speed up the nearest neighbor search, and prior art software libraries concerned with point cloud registration, namely Point Cloud Library, do not contain any implementation of multi-viewpoint registration.