1. Field of the Invention
The present invention relates in general to computer imaging and more particularly to a method and a system that use a visual tunnel analysis to obtain visual information from an image sequence.
2. Related Art
One of the main objectives of computer imaging and graphics is the generation of artificial environments (or scenes). These computer-generated scenes may be entirely fictional scenes or may be images of actual scenes, but in either case the goal is to provide realistic imagery of the scene. One technique used to generate these scenes is image-based rendering, which refers to the generation of new images and views of a scene from an example image sequence (or collection of images) and views of a portion of the scene. Image-based rendering provides photorealistic rendering of a scene with less computational costs than alternate techniques such as, for example, three-dimensional (3D) modeling. Image-based rendering models a scene by taking a sequence of example images of the scene and using this example image sequence to generate other new images of the scene. These generated new images represent the scene appearance from an arbitrary viewpoint. Thus, image-based rendering provides a mechanism for simulating a continuous range of camera viewpoints (and thus a model of the entire scene) from an example image sequence of only a portion of the scene.
One technique used to generate new views is by interpolating from the example image sequence to produce new images. In particular, from any given virtual camera location, the closest example image is retrieved and an interpolation is performed on the example image to generate a new image. This technique, however, has at least two major problems. First, proper interpolation requires that an extremely large number of example images be obtained. Obtaining example images is costly and time-consuming because the time and expense required to set up cameras and lighting to obtain quality example images of a scene. Second, this technique cannot determine or predict what views the virtual camera can obtain at its present location. This means that given a virtual camera location, no information can be obtained by interpolation about which views may be visualized from that virtual camera location.
Another technique used to generate new views is by reconstruction of the plenoptic function. The plenoptic function is a five-dimensional (5D) function representing an intensity of the light observed from every position and direction in 3D space. This object of this technique is to reconstruct the plenoptic function from an example image sequence. Once the plenoptic function has been reconstructed, new images can be generated using the reconstructed function. Ray-based scene reconstruction in computer graphics using a four-dimensional (4-D) representation is discussed by Marc Levoy and Pat Hanrahan in xe2x80x9cLight Field Renderingxe2x80x9d and by Steven J. Gortler, Radek Grzeszczuk, Richard Szeliski and Michael F. Cohen in xe2x80x9cThe Lumigraphxe2x80x9d, the entire contents of which are hereby incorporated by reference. These papers are contained in the Proceedings of SIGGRAPH 96, Computer Graphics Proceedings, Annual Conference Series, xe2x80x9cLight Field Renderingxe2x80x9d (pp. 31-42) and xe2x80x9cThe Lumigraphxe2x80x9d (pp. 43-54), August 1996, New Orleans La., Addison Wesley (Edited by Holly Rushmeier), ISBN 0-201-94800-1. One problem with using the plenoptic function is that it is a 5-D function and sampling and storing a 5D function for any useful region of space is impractical due to the computational expense and complexity involved. Thus, the techniques described in the above-referenced papers only use a 4-D representation of the plenoptic function, instead of a full 5-D representation. The simplifications and assumptions, however, that reduce the plenoptic function from a 5-D function to a 4-D function also greatly reduce the accuracy and efficiency of visual reconstruction and planning.
Using the 5-D plenoptic function as a scene representation for computer graphics was proposed by Leonard McMillan and Gary Bishop in xe2x80x9cPlenoptic Modeling: An Image-Based Rendering Systemxe2x80x9d, Proceedings of SIGGRAPH 95, Computer Graphics Proceedings, Annual Conference Series, pp. 39-46, August 1995, Los Angeles, Calif., Addison Wesley (Edited by Robert Cook), ISBN 0-10201-84776-0, the entire contents of which are hereby incorporated by reference. One problem with the technique used in the paper, however, is that the paper does not propose an efficient way of capturing and representing 5-D data of this form and does not discuss any method for visual planning.
Accordingly, there exists a need for the present invention that includes a method and system for obtaining visual information from an image sequence in an efficient, inexpensive and accurate manner. In particular, the method and system of the present invention considers only the appearance of a scene and not the scene geometry. In addition, the present invention does not use interpolation and thereby alleviates the time and expense of obtaining numerous images. The present invention also uses only a portion of the plenoptic function in order to simplify computation and save computational expense. Further, the method and system of the present invention provides visual planning and predicts the views that a virtual camera can obtain at a certain location.
To overcome the limitations in the prior art as described above and other limitations that will become apparent upon reading and understanding the present specification, the present invention includes a method and system for obtaining visual information from an image sequence using a visual tunnel analysis. The present invention obtains an image sequence (or collection of images) and processes the image sequence using a visual tunnel analysis to extract visual information. In general, the visual tunnel analysis uses a subset of the plenoptic function to determine the position and orientation of every light ray passing through each point of images in the sequence. A visual tunnel, which is a volume in visibility space that represents that portion of the visibility space contained in the image sequence, is defined for the image sequence. The visual tunnel is a representation of all light rays associated with the image sequence, and the visual tunnel analysis of the present invention encodes the light rays at every point in free space and maps the light rays to a ray representation space within the visual tunnel.
The present invention obtains visual information such as visibility prediction and visibility planning information by determining all light rays passing through each image plane within the image sequence. The present invention takes advantage of the fact that in addition to capturing a discrete set of images, an image sequence also captures a continuous range of camera viewpoints. These viewpoints are defined by a collection of all light rays passing through image planes. The visual tunnel analysis of the present invention uses this relationship to obtain visual information from the image sequence. One advantage of the present invention is that, because the visual tunnel technique is concerned with scene appearance, stereo and scene geometry are not modeled and thus stereo correspondence and other expensive forms of image processing are not needed. Another advantage of the present invention is that, unlike techniques that use view interpolation between images (such as a lumigraph technique and light field rendering), the size of the image sequence does not need to be unnecessarily large. This is because the present invention provides visibility planning information to allow the planning of camera trajectories, thus reducing the number of images contained in the image sequence to those necessary to visualize the scene. The present invention also permits the characterizing of a range of new views that may be predicted from an input camera trajectory.
In general, the method of the present invention obtains visual information from an image sequence by using a visual tunnel analysis to determine a position and orientation of each light ray within the image sequence. This visual tunnel analysis includes defining a visual tunnel and mapping light rays of the image sequence to a ray space representation within the visual tunnel. Using the visual tunnel, visual information such as visibility prediction and visibility planning may be obtained. For example, a visibility prediction may be obtained as to where a scene may be visualized or a visibility plan may be obtained made as to where an input sequence of images should be captured to obtain a desired visualization of a scene. In particular, when predicting regions of visibility, the method of the present invention includes obtaining an image sequence, converting the light rays associated with the image sequence into a visual tunnel representation and extracting a region of visibility from the visual tunnel representation. The image sequence is a set of images that captures a portion of the scene appearance and the light rays associated with all the pixels in all the images of the image sequence are converted to a visual tunnel representation. This conversion is accomplished in part by updating the plenoptic function so that, for example, for each light ray, the Gaussian spheres associated with all points in space that the light ray passes record the direction of the light ray. In other words, in order to predict regions of visibility in a scene the distribution of the Gaussian sphere at a location provides the distribution of light rays available to visualize the scene.
When planning a desired visualization of a scene (also called plenoptic authoring), the method of the present invention includes obtaining an area of interest, computing maximal regions for the area of interest where sampling occurs, determining the combination of visual tunnel primitives that minimally circumscribe the area of interest, and outputting a camera trajectory that will provide the desired visualization of the scene. The present invention includes a set of visual tunnel primitives that include, for example, concentric mosaics and conic arcs. The area of interest is divided into primitives and a combination of visual tunnel primitives (such as concentric mosaics and straight paths) is assembled to minimally circumscribe the area. Dividing the area into known visual tunnel primitives that minimally circumscribe the area minimizes the number of images needed and provides an efficient camera trajectory. The present invention also includes a system for obtaining visual information from an image sequence using a visual tunnel analysis that incorporates the above-described method of the present invention.