Building modeling has numerous applications in a wide variety of tasks such as urban planning, cartography, 3D visualization for virtual city tours and autonomous robot navigation. In many practical situations, it is desired to obtain 3D models for areas of hundreds and thousands of square kilometers within days and with minimal manual interaction. Traditionally, building modeling has employed data collected either from the ground or from the air. Ground-based data provides high details on facades of buildings, but lacks information on tops of buildings. Aerial data can yield very accurate building footprints and can facilitate the fusion of multiple images through ortho-rectification to cover large geographical areas, but aerial data generally lacks information on the sides of buildings necessary for producing a 3D model.
Most of the research on building modeling has employed 2D imagery, which is very inexpensive to acquire, but poses inherent difficulties for automatic 3D modeling due to known limitations of stereo algorithms for recovering 3D structure from multiple views, especially in untextured areas and at depth discontinuities. Because of these limitations, most image-based modeling systems in the prior art are either manual or semi-automatic and required a large amount of computer time to produce 3D models.
Light Detection and Ranging (LIDAR) has emerged in recent years as a viable and cost-effective alternative to using solely 2D imagery, because LIDAR can directly produce precise and accurate range information and can significantly alleviate the task of automatic building segmentation. Aerial LIDAR collections can acquire data over entire cities very rapidly, however due to constraints imposed by the operating conditions on the altitude and speed of an airborne platform, the resolution of the data is typically less than one point per square meter. LIDAR measurements from multiple views are typically co-registered together in the same global coordinate system by tracking the pose of the LIDAR sensor as it acquires data, using GPS and inertial measurement units (IMU). In several LIDAR based building segmentation algorithms, 3D points are classified in three classes: terrain, clutter and building. First, the 3D measurements are classified as ground and non-ground, and subsequently the non-ground points are divided into clutter and building regions.
In K. Kraus and N. Pfeifer, “Determination of terrain models in wooded areas with airborne laser scanner data,”ISPRS Journal of Photogrammetry and Remote Sensing, 53(4):193-203, 1998, an iterative method is presented for terrain modeling based on removing at each step 3D measurements with residuals to the current terrain surface larger than a threshold and re-estimating the terrain surface using the remaining data. Because the initialization of the terrain used all the data, the method may not converge for densely built regions. Morphological opening operators can be used to create a digital terrain model (DTM) which is subtracted from input data. DTM filters, inspired from image processing, may fail to produce good ground segmentation, especially for nonflat terrain. In V. Verma, R. Kumar, and S. Hsu, “3D building detection and modeling from aerial LIDAR datam,” in CVPR, 2006, the ground and the buildings are segmented in one step by computing local planar patches (surfels) using Total Least Squares (TLS) and connecting the consistent neighboring surfels into regions using bottom-up region growing. The largest region is selected as ground, while the rest of the regions are classified as individual buildings. In F. Rottensteiner, J. Trinder, S. Clode, and K. Kubik, “Automated delineation of roof planes from LiDAR data,” in Laser05, pages 221-226, 2005, a method is presented for the automatic delineation of roof planes using local plane estimation and grouping the local planes into larger regions starting from local seed locations. Over-segmented regions are merged together using co-planarity constraints. Points that do not belong to any surface are labeled as clutter. In J. Sun, H.-Y. Shum, and N.-N. Zheng, “Stereo matching using belief propagation,” PAMI, 19:787-800, 2003 a max-product belief propagation (BP) algorithm is used for stereo matching to enforce constraints that neighboring pixels with the same intensity values are assigned the same depth. In Y. Guo, H. Sawhney, R. Kumar, and S. Hsu, “Learning-based building online detection from multiple aerial images,” in ECCV, pages 545-552, 2001, a rectilinear approximation to outlines of buildings in an image based building detection system is employed. The orientation of each rectilinear fit is determined by finding the maximum in a histogram of local image gradients approximating tangent directions to the contour. This method assumes that errors of the contour points are relatively small.
Automating building segmentation is a critical component in any 3D modeling system by providing 3D regions (segments) with little or no manual interaction. Each of the aforementioned methods employ automatic building segmentation algorithms which fall short in efficiency or resolution. More particularly, the above-cited references assume that the ground (as opposed to buildings) covers the largest surface area, and are therefore innapropriate for use in modelling dense urban areas. Accordingly, what would be desirable, but has not yet been provided, is an efficient, accurate method for automatic building segmentation for automatically extracting 3D models of dense urban regions using aerial LIDAR data.