In 1979, Lippman used gryostabilized cameras mounted on top of a car to create an interactive visualization of downtown Aspen, Colo. (Lippman, A., “Movie-Maps: An Application of the Optical Videodisc to Computer Graphics,” Proceedings of SIGGRAPH, 1980, pp. 32-42.) This early hypermedia system, a forerunner of Google's Street View, pioneered the use of spatially indexed imagery for generating interactive video tours. Lippman's system allowed users to interactively explore an environment and presented the user with a sense of “being there.” The desire to enhance and improve the notion of visual presence has subsequently fueled a sizable body of work around interactive visual tours. However, most of these approaches rely on specialized data acquisition equipment and complex and time-consuming off-line processing pipelines, making them inaccessible to the casual user.
The idea of indoor interactive video tours has been explored by several authors in the fields of graphics, vision, and human-computer interaction. In 1986, Brooks was one of the first to propose a system to build rapid visual prototypes of buildings for architectural use. (Brooks, F., “WalkThrough—A Dynamic Graphics System for Simulating Virtual Buildings,” Proceedings of I3D′86, 1987, pp. 9-21.) More recently, Uyttendaele used an omnidirectional video to create indoor virtual tours. (Uyttendaele, M., Criminisi, A., Kang, S. B., Winder, S., Szeliski, R., and Hartley, R., “Image-Based Interactive Exploration of Real-World Environments,” IEEE Computer Graphics and Applications, 2004, 24: pp. 52-63.) Similar approaches are used in Google's Streetview and Art Project. (Anguelov, D., Dulong, C., Filip, D., Frueh, C., Lafon, S., Lyon, R., Ogale, A., Vincent, L., and Weaver, J., “Google Street View: Capturing the World at Street Level,” June 2010, Computer, 43(6): pp. 32-38; Google Inc., Google Art Project, 2011, http://www.googleartproject.com.) These approaches require sophisticated omnidirectional camera rigs and several hours of offline processing. Quiksee, in contrast, uses a hand-held camcorder and an offline processing pipeline and requires manual spatial registration and does not model the geometry of the scene.
Early mobile robot navigation systems, such as those proposed by Ishiguro and Yagi, utilized omnidirectional camera systems combined with odometry measurements to reconstruct environments for mobile robot navigation. (Ishiguro, H., Ueda, K. and Tsuji, S., “Omnidirectional Visual Information for Navigating a Mobile Robot,” Proceedings of ICRA, 1993, pp. 799-804; Yagi, Y., Kawato, S. and Tsuji, S., “Real-Time Omnidirectional Image Sensor (COPIS) for Vision-Guided Navigation,” IEEE Transactions on Robotics and Automation, (10)1, 1994, pp. 11-22.) Taylor also estimated camera position and environment geometry from video data. (Taylor, C. J., “VideoPlus: A Method for Capturing the Structure and Appearance of Immersive Environments,” IEEE Transactions on Visualization and Computer Graphics, 8(2), 2002, pp. 171-182.) Taylor's approach required the user to specify several point-and-line correspondences in keyframes of the omnidirectional video. The visual simultaneous location and mapping (“SLAM”) and computer vision communities have developed automatic approaches to reconstruct indoor scenes from images. (Flint, A., Murray, D., and Reid, I., “Manhattan Scene Understanding Using Monocular, Stereo, and 3D Features,” Proceedings of ICCV, 2011, pp. 2228-2235; Furukawa, Y., Curless, B., Seitz, S. M., and Szeliski, R., “Reconstructing Building Interiors from Images,” Proceedings of ICCV '09, 2009, p. 80-87; Snavely, N., Seitz, S. M., and Szeliski, R., “Photo Tourism: Exploring Photo Collections in 3D,” Proceedings of SIGGRAPH, pp. 835-846; Coorg, S., and Teller, S., “Extracting Textured Vertical Facades from Controlled Close-Range Imagery,” Proceedings of CVPR, 1999, pp. 625-632; Szeliski, R. and Shum, H., “Creating Full View Panoramic Image Mosaics and Environment Maps,” Proceedings of SIGGRAPH, 1997, pp. 251-258.) While computer vision-based 3D reconstruction has demonstrated potential, it is computationally expensive and does not work well on texture-poor surfaces (e.g., painted walls), which dominate interiors. SLAM-based reconstruction has been shown to work on smartphones, but may be restricted to modeling only corridors. (Shin, H., Chon, Y., Cha, H., “Unsupervised Construction of an Indoor Floor Plan Using a Smartphone,” IEEE Transactions on Systems, Man, and Cybernetics, Volume PP, Issue 99, 2011, pp. 1-10.) Kim employed a Manhattan-world assumption to acquire indoor floor plans in real-time. (Kim, Y. M., Dolson, J., Sokolsky, M., Koltun, V., Thrun, S., “Interactive Acquisition of Residential Floor Plans,” Proceedings of ICRA, 2012, pp. 3055-3062.) Kim's approach is hardware-intensive, requiring the user to carry a Kinect camera, a projector, a laptop, and a special input device while capturing data around the house.
The recent shift of imaged-based systems to the mobile phone platform is exemplified by the mobile Photosynth application that creates panoramic images in real time on a smartphone. (Microsoft Corporation, Photosynth, 2011, http://photosynth.net/.) MagicPlan is a commercial floor plan generation app available for the iPhone. (Sensopia Inc., MagicPlan, 2011, http://www.sensopia.com.) By marking floor corners in the room via an augmented reality interface, MagicPlan is able to estimate dimensions of the room and generate a corresponding floor plan. MagicPlan reconstructs rooms individually and then has a user manually assemble them to form a complete floor plan.