1. Field of Invention
The present invention relates to interactive networks and, more particularly, to a network in which a server interactively provides views of a virtual reality world to a client.
2. Description of Related Art
Unlike text-based media, video must be transmitted in a predictable, synchronized manner, and requires a guaranteed quality of service, with guaranteed bandwidth and guaranteed bounds on other properties such as latency and jitter. Protocols that support guaranteed quality-of-service media connections soon will be provided by ATM-based networks, or by other technologies such as FDDI and Fast Ethernet. Such protocols establish a virtual connection between a sender (a multimedia server) and a receiver (a client) provided that sufficient resources can be reserved along the path to support the minimum level of quality of service required by the connection.
Photo-realistic virtual reality applications are similar to video-based real-time applications, but provide full interaction. In many virtual reality systems, the user must have a real perception of the environment that is being explored or discovered, and a smooth interaction with the environment. In an interactive web-system scenario, the client carries the virtual camera and navigates through the virtual environment. The server constantly receives details regarding the client camera position and orientation, as well as its activities which may modify the virtual environment. All the information concerning the entire setting is held at the server. According to the client movement, the server updates the client with essential data which enables the generation of new views.
Time lag and low quality images are the main reasons for a decrease in the sense of reality. High fidelity and photo-realism are achieved by using a fully textured (photo-mapped) environment. Today we are witnessing a rapidly increasing presence of 3D virtual worlds on the world wide web, described using a virtual reality modeling language (VRML). However, the interaction with remote virtual environments on the web is still extremely limited. The common approach is to first download the entire VRML 3D world to the client. Then the client renders the scene locally. This approach is successful as long as the environment is not too complex; otherwise it causes a critical penalty in the downloading time. This prevents the use of photo-textures, which are necessary for a photo-realistic impression. It should be emphasized that the downloading time is required for every change of session, for example, if the user moves to an upper floor in a shopping application or to another planet in a video game.
To avoid the above drawbacks, an alternative approach has been suggested in which the server computes the new views and sends them compressed to the client. Although each image is compressed (e.g., JPEG), the volume of transmission is still quite large and would either require an expensive bandwidth or lower the quality of the images. Video compression techniques such as MPEG, which exploit temporal data redundancy, are based on inter-frame dependencies and may be compressed on-line, but with a time lag which prohibits real-time feedback.
There is thus a widely recognized need for, and it would be highly advantageous to have, a method for providing views of a remote complex virtual reality world, at the client of an interactive server-client system, fast enough to preserve the illusion of virtual reality.
In visual navigation applications there is always a need to balance the imaging quality and the frame rate. In interactive real-time systems, one is required to maintain a user-specified minimal frame rate. T. A. Funkhouser and C. H. Sequin (Adaptive display algorithm for interactive frame rates during visualization of complex virtual environments, Computer Graphics (SIGGRAPH '93 Proceedings), pp. 247-254, August 1993) proposed an algorithm that adjusts the image quality adaptively by choosing the level-of-detail and rendering algorithm according to its estimated rendering cost. P. W. C. Maciel and P. Shirley (Visual navigation of large environments using textured clusters, 1995 Symposium on Interactive 3D Graphics, pp. 95-102, April 1995) suggested the use of an imposture to trade speed for quality. An imposture must be faster to draw than the true model while visually resembling the real image. Textures mapped on simplified models are a common form of imposture. J. Shade, D. Lischinski, D. H. Salesin, J. Snyder and T. Derose (Hierarchical image caching for accelerated walkthroughs of complex environments, Computer Graphics (SIGGRAPH '96 Proceedings)), G. Schauffler and W. Sturzlinger (A three dimensional image cache for virtual reality, Eurographics '96, Computer Graphics Forum Vol. 15 No. 3 pp. 227-235, 1996) and D. G. Aliaga (Visualization of complex models using dynamic texture-based simplification, Proceedings of Visualization 96) all used a single texture polygon. These image-based primitives are view-dependent and form a compact representation; thus they have the potential to be more appropriate in applications which also need to sustain a user-specified communication bandwidth.
S. Eric Chen and L. Williams (View interpolation for image synthesis, Computer Graphics (SIGGRAPH '93 Proceedings), pp. 279-288, August 1993) and T. Kaneko and S. Okamoto (View interpolation with range data for navigation applications, Computer Graphics International, pp. 90-95, June 1996) generated novel images from a number of precalculated reference images by "view interpolation." Along with the images, corresponding maps are necessary so that one image can be morphed into another. The user can stroll through restricted paths connecting successive locations at which the precomputed views are stored, providing the sensation of continuous in-between views.
The advantage of view interpolation and any other image-based rendering technique is that the generation of a new image is independent of the scene complexity. The technique gives more freedom than strolling back and forth within a video sequence. However, it works well only if adjacent images depict the same object from different viewpoints. The interpolated views may introduce some distortions because linear interpolation does not ensure natural or physically valid in-between images. Recently, S. M. Seitz and C. R. Dyer (View morphing, Computer Graphics (SIGGRAPH '96 Proceedings)) proposed a new method, called "view morphing," which better preserves the in-between shape appearance. Image-based methods usually do not consider the underlying 3D model, and some inherent problems, known as holes and overlaps, need to be alleviated. In the paper by Kaneko and Okamoto cited above, a full range of data, acquired from a range scanner, is associated with each reference image. The exact range simplifies the generation of the in-between images. No correspondence is required, and overlaps are easily resolved by a Z-buffer approach. P. E. Debevec, C. J. Taylor and J. Malik (Modeling and rendering architecture from photographs: a hybrid geometry- and image-based approach, Computer Graphics (SIGGRAPH '96 Proceedings)) use a set of viewpoints to approximate the 3D model, and new views are then rendered from arbitrary viewpoints by a view-dependent texture-mapping technique.