1. Field of the Invention
The invention is related to video tours, and more particularly to interactive video tours using a image-based rendering techniques for exploring remote real-world locations.
2. Description of the Related Art
For more than a decade, interactive video tours have been of great interest to people. People often desire to visually explore remote locations. Video tours have provided users with the ability to view and explore such locations. Video tours are based on the idea of viewing sequences of images or video previously acquired at remote locations. A viewer enables a user to interactively view such images or videos, so that the impression of a virtual tour is generated.
The idea of video tours goes back to Lippman, who in his seminal “Movie Maps” project developed an early such system. This project is described in Lippman, A., “Movie maps: An application of the optical videodisc to computer graphics,” Computer Graphics (SIGGRAPH'80), vol. 14(3), July 1980, pp. 32-43. His system was based on a collection of photographs acquired by driving through an urban environment. It allowed the user to interactively navigate through these clips. Boult extended this idea into the use of panoramic images and video, in the paper by Boult, T. E., “Remote reality via omnidirectional imaging,” SIGGRAPH 1998 Technical Sketch, p. 253. The use of panoramic images enables a user to change her viewing direction at will, increasing the perceived control of a user over the contents of the tour. Similar systems were developed by Uyttendaele et al., U.S. Pat. No. 6,968,973 and Foote et al., U.S. Pat. No. 7,096,428. These inventions develop interactive mechanisms that enabled users to navigate through panoramic video along virtual paths. Their systems provide the user with freedom to chose the tour direction, speed, viewing angle, and zoom level at will.
In all these systems, the user must abide to the exact same path along which the original video or image data was acquired. Thus, for any path chosen by a user, sequences of panoramic images must be available. However, collecting such images may be impossible, or uneconomical. This problem is paramount for urban environments. Here the motion of the camera during image acquisition may be severely limited. For example, a data acquisition vehicle may be prohibited from executing a specific motion which a virtual tourist may desire to explore. In such situations, the set of panoramic images is incomplete. The database will lack the necessary panoramic images for enabling the user to follow the desired motion direction. As a result, the motion of the user will be limited, and it may be impossible to provide a realistic tour through such environments.
This limitation is overcome by synthesizing new panoramic views from the views in the image database. The common approach is based on seminal work by Levoy and Hanrahan. The light field technique is disclosed in a paper entitled “Light Field Rendering,” by M. Levoy and P. Hanrahan, Computer Graphics (SIGGRAPH '96), pp. 171-80, 1996, and in U.S. Pat. No. 6,097,394, issued Aug. 1, 2000. This technique requires as input a multitude of images, acquired at nearby locations. By stitching together areas from multiple images taken at different locations, hypothetical new images can be synthesized for arbitrary nearby viewing locations. This idea has been extended to sets of panoramic images acquired over irregular grids of view points by Aliaga et al., see U.S. Pat. Nos. 7,027,049 and 6,831,643. By combining image segments from multiple such images, new views can be synthesized for arbitrary nearby locations. However, such an approach suffers from two limitations. First, in many application domains the collection of a grid of panoramic images may be uneconomical or infeasible. For example, when building virtual tours for entire cities, it may be uneconomical to traverse streets more than once. Moreover, certain locations may never be entered, e.g., because of obstacles that prohibit a vehicle from moving there. Second, and more importantly, cities are full of moving objects, hence the appearance of urban locations vary with time. Images acquired sequentially (e.g., by a moving vehicle) may hence be mutually inconsistent. When stitching together images in such environments using sequentially acquired image streams, the resulting synthetic views may appear unrealistic. For example, when stitching together images containing a moving vehicle, recorded at different points in time, the resulting panorama may contain part of a car, which is not a realistic image. In such situations, a combined image generated by stitching together sub-regions from different images may look unrealistic.
When viewing panoramic images, a user might want to move freely, without consideration of the specifics of the data acquisition process; further, a user might want to see images free of motion artifacts that arise when stitching together multiple panoramic images. This motivates the problem of generating synthetic views from individual panoramic images, and in ways that do not require a dense grid of image recording locations.