The data collected by LIDAR, time-of-flight imagers, laser scanners, stereo imagers, or other related sensors contains millions of data points that store the spatial coordinates of the each data point along with any other information, such as RGB color information. Advances in sensor technology have enabled such colorized point cloud data to be routinely collected for large urban scenes using both ground-based and airborne LIDAR sensor platforms.
LIDAR (Light Detection and Ranging) is an optical remote sensing technology that measures properties of scattered light to find range and/or other information of a distant target. The prevalent method to determine distance to an object or surface is to use laser pulses. The result of scanning an urban scene, for example, with a LIDAR is millions of data points pn each having x, y and z spatial coordinates pn=(x,y,z).
Once the millions of points have been collected the problem is to recognize meaningful objects from the millions of points from objects such as buildings, trees, and streets. Humans do not see millions of points, but instead seemingly effortlessly break the scene down into buildings, trees, cars, etc. Humans are further assisted by prior knowledge of the world, which enables sifting through the seemingly infinite number of possibilities to determine a few plausible ones. For example, humans know that objects such as buildings rest on the ground, and so human use this information to determine the ground plane in the vicinity of the objects.
Estimating where the ground plane is from the millions of collected points collected by a sensor is a challenge. However, if this can be done with reasonable accuracy then the groundwork is laid to recognize other meaningful objects in the millions of collected points.
Some prior art techniques to recognize objects, as well as the ground plane, rely on strict assumptions on the serial ordering of the collected 3D scan lines. One approach is to try to reconstruct surface meshes by triangulation of the points, which can be slow, sensitive to noise, and makes assumptions about sampling density. The prior art also attempts to directly process the individually collected data points, which introduces scalability issues.
Another approach in the prior art is to build intermediate representations that reduce resolution and may be sensitive to quantization. Yet another approach that has been tried is to use level sets and other continuous approximations like B-splines, which have lower memory requirements, but cannot easily handle sharp edges or peaks in the data.
Yet another approach using mesh-based representations requires non-trivial processing to construct and cannot be updated with new incoming data. Other implicit geometry representations such as voxels allow efficient processing, but may be sensitive to missing information and empty cells since they only store local statistics.
All of these approaches attempt to find objects such as buildings, trees, cars and streets, and also attempt to find other more obscure objects such as poles, powerlines and posts. These approaches also attempt to estimate the ground plane. However, all of these prior art approaches have disadvantages and are not robust.
What is needed is a method for estimating the ground plane and recognizing objects such as buildings, trees, cars and streets from millions of collected 3D data points. Also needed is a method to find other objects such as poles, powerlines and posts. The embodiments of the present disclosure answer these and other needs.