Driven by markets for games, movies, map services, robotics, navigation, photogrammetry, etc., a strong demand has developed for photo-realistic modeling of structures such as buildings, cities, landscapes, etc. However, conventional modeling of such structures focused on large-scale aerial photography-based city modeling. As a result, when these models are zoomed in to ground level, the visual results that viewers experience can be disappointing, with blurry models and vague or few details. Moreover, to provide a rewarding user experience, many potential applications demand photo-realistic street-level representation of such structures where most of our daily activities take place. In term of spatial constraints, the coverage of ground-level images is typically close-range. In addition, an increasing amount of data (e.g., movies, pictures, geospatial data, etc.) can be captured and processed in attempts to create such photo-realistic models. As a result, street-side modeling, for example, becomes much more technically challenging.
For instance, conventional approaches can range from pure synthetic methods such as artificial synthesis of buildings based on grammar rules, 3D (three-dimensional) scanning of street façades, to image-based approaches. Some examples, include requiring manual assignment of depth to the constructed façade as a result of using limited image data (e.g., one image-based approaches). In other examples, information from reconstructed 3D points can be used to automatically infer the critical depth of each primitive. However, some implementations require tedious 3D scanning, while others suffer from scaling difficulties in large scale modeling of structures such as buildings due to the use of a small set of images.
Conventional approaches to façade, building, and architectural modeling can be classified as rule-based, image-based and vision-based modeling approaches. For example, in rule-based methods, the procedural modeling of buildings are accomplished using a set of expert-specified rules. In general, procedural modeling can be limited in the realism of resulting models and their variations due to the required rule set and the potential inflexibility of the rule set. Thus, defining the required rules to produce photo-realistic models of existing structures such as buildings can be complex.
In image-based example(s), conventional approaches use images as guide to generate models of architectures interactively. However, even more sophisticated image-based methods can require manual selection of features as well as require tedious indication of the correspondence in different image views. As a result, conventional image-based examples can suffer from scaling problems. In other single image-based examples using manual depth assignment (e.g., such as assigning depth based on a painting metaphor or sketching approach), a limited domain of regular façades can be used to highlight the importance of features, such as windows in an architectural setting to create a building.
While some conventional examples can provide acceptable models in the context of regular buildings with simple repetitive façades, irregularities in building characteristics (e.g., such as in a street-side façade) require more sophisticated techniques. Other examples, while having general applicability in the context of irregularities, can be difficult to scale up for large-scale reconstruction due to intense manual interaction. Still other examples can require tedious manual assignment of model parameterizations and point correspondences.
On the other hand, vision-based examples can automatically reconstruct urban scenes from images. Typical examples can result in meshes on a dense stereo reconstruction. However, proper modeling with man-made structural constraints from reconstructed point clouds and stereo data has not yet been addressed. For example, while some examples use line segments to reconstruct buildings, other examples can construct 3D architectural models from short image sequences based on a model-based Bayesian approach. However, the latter examples still heavily rely on many specific architectural rules and model parameters.
Other examples of urban scene modeling have been based on aerial images. As discussed, the results of such approaches, while providing acceptable results from a top view perspective, leave much to be desired from a ground level perspective. Further examples have used a combination of aerial imagery, ground color and Light Detection and Ranging (LIDAR) scans to construct models of façades. However, like stereo methods, the approaches can suffer from the lack of representation for the styles in man-made architectures. Still other examples can create panoramas of roughly planar scenes, but the panoramas are created without producing corresponding 3D models.
It is thus desired to provide enhanced systems, structures and methodologies for producing three-dimensional façade models from images that improve upon these and other deficiencies of conventional modeling technologies. The above-described deficiencies of typical modeling technologies are merely intended to provide an overview of some of the problems of conventional systems, and are not intended to be exhaustive. Other problems with conventional systems and corresponding benefits of the various non-limiting embodiments described herein may become further apparent upon review of the following description.