With the availability of geographical information and associated aerial image information on the Internet, there has been an increased demand for 3D geometric models which enable users to navigate near ground level through metropolitan areas, for example. However, while current systems provide stunning images from a bird's eye view, the resolution of geometry and texture is not sufficient for important applications in which a user navigates near ground level. For example, a virtual training application for emergency response will require interaction with a detailed model of high visual quality and realism, including semantic information for meaningful simulation. Other applications in the entertainment industry, urban planning, visual impact analysis, driving simulation, and military simulation have similar requirements. Thus, for providing 3D geometric models at a large scale, i.e. for an extensive geographical area, and with sufficient quality for practical applications, required are efficient mechanisms for urban reconstruction based on low resolution oblique aerial imagery and, in particular, reconstruction of facades based on higher resolution ground-based imagery. While computer graphics techniques meet the quality criteria of most applications, the conventional methods of large-scale reconstruction require several man years of labor. Although, recent techniques in computer graphics focus on efficient large-scale modeling, they do not provide a sufficient resemblance to the real life environment, and they do not support urban reconstruction from single facade images.
Urban reconstruction algorithms using ground-based facade images have been proposed by DEBEVEC, P. E., TAYLOR, C. J., AND MALIK, J., 1996, “Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach”, Proceedings of ACM SIGGRAPH 96, ACM Press, H. Rushmeier, Ed., 11-20; JEPSON, W., LIGGETT, R., AND FRIEDMAN, S., 1996, “Virtual modeling of urban environments,” PRESENCE 5, 1, 72-86; DICK, A., TORR, P., RUFFLE, S., AND CIPOLLA, R., 2001, “Combining single view recognition and multiple view stereo for architectural scenes,” ICCV, IEEE Computer Society, Los Alamitos, Calif., 268-274; WANG, X., TOTARO, S., TAILLANDIER, F., HANSON, A., AND TELLER, S., 2002, “Recovering facade texture and microstructure from real-world images”, Proc. ISPRS Commission III Symposium on Photogrammetric Computer Vision, 381-386, 2002; LEE, S. C., JUNG, S. K., AND NEVATIA, R., 2002, “Automatic integration of facade textures into 3D building models with a projective geometry based line clustering”, Computer Graphics Forum 21, 3 (September), 511-519; and REALVIZ, 2007, “Realviz ImageModeler V4.0, product information”, http://www.realviz.com. Generally, in these systems, a user is assisted by computer vision methods (e.g. Debevec et al. 1996) during modeling, while most automatic processes rely on graphical simplifications, limit the appearance of facade elements to pre-specified types, or rely fully on the detection and analysis of edges, which limits the detection of windows, for example, in otherwise homogeneous facades.