Driven by markets for games, movies, map services, robotics, navigation, photogrammetry, etc., a strong demand has developed for photo-realistic modeling of structures such as buildings, cities, landscapes, etc. However, conventional modeling of such structures focused on large-scale aerial photography-based city modeling. As a result, when these models are zoomed in to ground level, the visual results that viewers experience can be disappointing, with blurry models with and vague or few details. Moreover, to provide a rewarding user experience, many potential applications demand photo-realistic street-level representation of such structures where most of our daily activities take place.
For instance, current models of cities are often obtained from aerial images as demonstrated by Google™ Earth and Microsoft® Virtual Earth® (three-dimensional) 3D platforms. However, such image-based city modeling methods using aerial images typically cannot produce photo-realistic models at ground level. As a transition solution, Google™ Street-View, Microsoft® Live Street-Side and the like can display captured two-dimensional (2D) panorama-like images with fixed view-points, which solutions can be insufficient for applications that require true 3D photo realistic models such as enabling user interactions with 3D environments. In addition, many conventional methods for generating 3D models from images suffer from various deficiencies.
For example, conventional interactive methods to generate 3D models from images typically require significant user interaction, which cannot be easily deployed in large-scale modeling tasks. As a further example, more automated methods that focus on early stages of the modeling pipeline have not yet been able to produce satisfactory results for graphics applications. Further image-based city modeling methods (e.g., single view methods, interactive multi-view methods, automatic multi-view methods, and so on) suffer from similar or other deficiencies.
In image-based example(s), conventional approaches use images as guide to generate models of architectures interactively. As an example, conventional single-view methods allow creation of models from a single image by manually assigning the depth based on a painting metaphor. In other single image-based examples using manual depth assignment (e.g., such as assigning depth based on a sketching approach), a limited domain of regular façades can be used to highlight the importance of features, such as windows in an architectural setting to create a building. Generally, these methods require intensive user interactions to produce visually pleasing results. As a result, conventional image-based examples can suffer from scaling problems. However, even more sophisticated image-based methods can require manual selection of features as well as require tedious indication of the correspondence in different image views.
For instance, some interactive multi-view examples can use line segment features in images and polyhedral blocks as 3D primitives to interactively register images and to reconstruct blocks with view dependent texture mapping. However, the manual selection of features and the correspondences in different views that is required is tedious. As a result, such methods suffer from scaling difficulties as the number of input images grows.
In further examples, a semi-dense set of reconstructed point clouds can be used to operate in a fronto-parallel reference image of a façade to provide acceptable modeling results. As yet another example, using registered multiple views and extracting major directions by vanishing points can also provide good modeling results. However, these methods continue to involve significant user interactions that make the methods difficult to adopt in large-scale city modeling applications. In some conventional automatic multi-view modeling methods, a 3D modeling architectural modeling method for short image sequences still requires a user to provide intensive architectural rules for Bayesian inferences.
In image-based modeling, it is understood that line features in man-made scenes can be used to facilitate modeling such scenes. For instance, line segments can be used for building reconstruction from registered images by sparse points, and line features can be used for both structure from motion and modeling. However, line features tend to be sparse and geometrically less stable than points.
In other conventional approaches to modeling urban environments, a systematic approach can employ video cameras using real-time video registration while focusing on the global reconstruction of dense stereo results from the registered images. However, the lack of architectural constraints result in many irregularities in the final modeling results.
It is clear that, while some conventional modeling examples can provide acceptable models in the context of regular buildings with simple repetitive façades, irregularities in building characteristics (e.g., such as in a street-side façade) require more sophisticated techniques. Other examples, while having general applicability in the context of irregularities, can be difficult to scale up for large-scale reconstruction due to intense manual interaction. Still other examples can require tedious manual assignment of model parameterizations and point correspondences.
It is thus desired to provide enhanced systems, structures and methodologies for producing three-dimensional models from images that improve upon these and other deficiencies. The above-described deficiencies of typical modeling technologies are merely intended to provide an overview of some of the problems of conventional systems, and are not intended to be exhaustive. Other problems with conventional systems and corresponding benefits of the various non-limiting embodiments described herein may become further apparent upon review of the following description.