A moving picture movie is displayed by rapidly sequencing through a series of still images. Typically, the movie is acquired with a film movie camera or a video camera capturing the images 24, 50, or 60 times a second. This type of motion is considered linear in the sense that the images comprising the movie are played from the beginning to the end, with no interactivity.
In contrast to a conventional movie, an interactive multimedia application enables a user to move through a synthetically modeled environment, and hence, allows the user to control the sequence in which the images are displayed. The user interactively controls both position and orientation within the model, exploring the model space through various modes of navigation. Examples of such navigational modes are walking, jumping, driving, floating, flying, spinning, object-manipulating, head-turning, and zooming.
Such interactive environments may be implemented using two types of systems; polygon-based rendering systems, and image-based rendering systems. In a polygon-based rendering system, the interactive environment is created using a 3D model. The polygon-based rendering system uses the 3D model to compute new views of scene based on the user's movement. This requires the rendering system to compute fifteen or more new images of the scene each second to create a sensation of motion. Therefore, as a practical matter, the complexity of the 3D model is limited to enable the system to generate the images quickly enough.
An image-based rendering system avoids the need to limit model complexity by using digital images to create the interactive environment, rather than 3D models. Because image-based rendering systems resample source images of a scene, rather than relying on a polygonal representation of that scene, the imagery can be as complex as the image representation allows without affecting the computational cost. In particular, the image representation of the scene can be photographic, or can be pre-rendered over a considerable amount of time, as only run-time resampling must take place at interactive rates. Since the user has control over the speed and direction of the progression of the multimedia presentation, time is generally not applicable to this type of interactive multimedia content. As a result, multimedia content tends to be organized spatially or topically, rather than, or in addition to, temporally.
To create the interactive environment, a view of a scene is typically captured at various locations. The spatial relationship between these views is recorded, either as coordinates and orientations in space, known as a metric representation, or in a network of adjacencies, known as a topological representation. Typically, a combination of metric and topological representation is used. A full metric representation can be processed to derive a topological representation, but not vice versa. If a large number of views are provided for a scene, then playing or sampling an appropriate sequence of these views yields the sensation of motion.
Interactive multimedia implemented using image-based rendering requires much more data than linear movies in order to represent the myriad of paths that the user can take. Thus, in image-based rendering, it is the size and the number of the images representing the scene that is the limiting factor, rather than the scene's complexity. Because of its large data storage requirements, image-based rendering can limit the amount of multimedia content that can be stored on fixed capacity media, such as CD-ROMs.
Based on the above, it would be desirable to synthesize a large number of views from a smaller set in order to reduce the number of images that need to be stored. In some navigational modes, a number of views can, and are, synthesized from a single image. Examples include spinning, panning, and zooming, which are implementable with standard rotation, translation, and scaling techniques. Other navigational modes, however, such as walking, driving, and flying, where the user is moving through the scene, still require the rendering of many images in order to simulate movement.
What is needed therefore is a method and system for simulating movement in an interactive image-based environment that reduces the storage requirements of the interactive application. The present invention addresses such a need.