The present invention is related to the viewing a scene of a 3D environment stored in a server machine from a client machine at a location remote from the server machine. More specifically, the present invention relates to viewing a scene of a 3D environment stored in a server machine from a client machine remote from the server machine where the client machine uses previous views of the scene to predict a next view and the server machine sends only a difference between the predicted view and the next view to the client machine for the client machine to form the next few.
The steadily increase of computing power promises a wide variety of compelling multimedia experiences for users in the next decade. One often-stated goal is the development of shared virtual worlds and entertainment broadcasts that allow consumers to remotely explore 3D spaces. However, the speed of the Internet and other broadcast media cannot keep up with the demand for available bandwidth, if thousands of users are to have high-fidelity access to remote worlds. To address this issue, the present invention presents a class of compression schemes designed to significantly reduce the bandwidth required for remote navigation.
In a typical setup, amuser explores a virtual world on a client machine. This machine requests views of the world from a server machine. Sending the entire model over the network in advance is extremely slow, or impossible for dynamic scenes; one solution is to send each camera view from the server to the client for each frame as a compressed image. This solution will still require high network bandwidth to display video at interactive frame rates.
The present invention presents a novel compression scheme that predicts the appearance of new views from previous views, using the known camera motion and image-based rendering techniques. This allows the server to send only incremental amounts of information for each frame, greatly reducing the bandwidth required for remote navigation. Unlike most image compression schemes, this method is cooperative: the client and server can communicate to determine the data for the server to send to the client in each frame that maximizes quality of service for a given available bandwidth.
It is assumed that available network resources in the coming decade will lag far behind increasing processor power, and will be the limiting factor on navigation frame rates. Thus, some additional computation required for each frame is acceptable.
The present invention is based on image-based rendering, in which images are used as primitives rather than 3D models. (See [Sing Bing Kang. A Survey of Image-based Rendering Techniques. Technical Report 97/4. Digital Equipment Corporation Cambridge Research Laboratory, August 1997, incorporated by reference herein] for a survey.) Many authors have described image-based techniques for utilizing temporal coherence to reduce rendering latency; for example, see [Shenchang Eric Chen, Lance Williams, View Interpolation for Image Synthesis. SIGGRAPH 93, August 1993, p279-288]; Shenchang Eric Chen. QuickTime VRxe2x80x94an image-based approach to virtual environment mapping. SIGGRAPH 95, August 1995, p29-38]; Jay Torborg, Jim Kajiya. Talisman: Commodity Realtime 3D Graphics for the PC, SIGGRAPH 96, 1996, p353-363]; Jonathan Shade, Dani Lischinski, David Salesin, Tony DeRose, John Snyder. Hierarchical Image Caching for Accelerated Walkthroughs of Complex Environments. SIGGRAPH 96, p75-82]; Lucia Darsa, Bruno Costa, and Amitabh Varshney, Navigating Static Environments Using Image-Space Simplification and Morphing. ACM Symposium on Interactive 3D Graphics, Providence, R.I., 1997, pp. 25-34; Francois Sillion, George Drettakis, and Benoit Bodelet. Efficient Impostor Manipulation for Real-Time Visualization of Urban Scenery. Computer Graphics Forum (Proc. of Eurographics ""97). September, 1997. p207-218, all of which are incorporated by reference herein]. Regan and Pose [Matthew Regan, Ronald Pose. Priority Rendering with a Virtual Reality Address Recalculation Pipeline. Computer Graphics (SIGGRAPH 94 Conference Proceedings). 1994, incorporated by reference herein] use an image-based approach to overcome high network latencies. In these systems, reference images are generated and sent to the client system. The client then reprojects these images to generate new views at interactive rates until the next set of reference images arrive. [Brook Conner and Loring Holden. Providing a Low Latency User Experience In a High Latency Application ACM Symposium on Interactive 3D Graphics, Providence, R.I., 1997, pp. 45-48, incorporated by reference herein] discusses techniques for hiding the effects of latency in a shared world. These methods are complementary to the approach of the present invention, since they address latency rather than bandwidth.
Two commercial products, Apple QuickTime VR 3.0 [Apple Computer, Inc., QuickTime VR 3.0. http://www.apple.com/quicktime, incorporated by reference herein] and LivePicture ImageServer [Live Picture, Inc., http://www.livepicture.com/, incorporated by reference herein] send panoramas over the network in pieces, so that the client may view the scene without having to receive the entire panorama. However, a large portion of the panorama must be downloaded before much viewing can begin. Also, these systems are not easily extensible to handle dynamic imagery or camera translations.
The approach presented in the present invention is an image-compression scheme, based on specialized a prior knowledge about the images. Compare, for example, the multiscale compression schemes [Peter Burt, Edward Adelson. The laplacian pyramid as a compact image code. IEEE Transactions on Communications, 31(4):532-540, April 1983, incorporated by reference herein] and [Wim Sweldens. The lifting scheme: A custom-design construction of biorthogonal wavelets. Technical Paper 1994:7, Industrial Mathematics Initiative, Department of Mathematics, University of South Carolina, 1994, incorporated by reference herein] , which use prediction and difference. This work is designed for a cooperative client-server approach, similar to [Marc Levoy. Polygon-Assisted JPEG and MPEG Compression of Synthetic Images. SIGGRAPH 95, (August 95), p21-28, incorporated by reference herein]. The MPEG compression scheme [D. Le Gall, MPEG: A Video Compression Standard for Multimedia Application. Communications of the ACM, Vol. 34, No. 4, April 1991, p46-58, incorporated by reference herein] uses optical flow to predict video frames, for replaying prerecorded video. Chang et al. [Ee-chien Chang, Chee Yap, T. J. Yen. Realtime Visualization of Large Images over a Thinwire. IEEE Visualization ""97 (Late Breaking Hot Topics). Tucson, Ariz., 1997, incorporated by reference herein] use foveation, a spatially-varying compression scheme for remote viewing of very large 2D images. This is a generalization of wavelets, that allows an image to be displayed at different resolutions at different locations. In this system, the server sends only the image coefficients necessary to update the view of a static, 2D space.
A panorama viewer (such as QuickTime VR) takes a source image or xe2x80x9cenvironment mapxe2x80x9d, projects it into view space using a view transform (for example, using a cylindrical or spheric projection), and then displays this projected image on the screen. This technique can be used to create an impression of a navigable 3D environment. Most panorama viewers assume a fixed camera position, and navigation is performed using a combination of rotations and scales. Translations are done by jumping from one panorama to another. For example, LivePicture [Live Picture, Inc., http://www.livepicture.com/, incorporated by reference herein] has created nested panoramas, allowing the user to zoom in and be transferred from one panorama to another.
Source images for panorama viewers can become very large. For example, in our application, the test panorama image is 1240xc3x97380 pixels. These images are too large to be conveniently transmitted over the Internet. Consequently, distributed (client-server) implementations of a panorama viewers are being developed.
There are several approaches to implement a network based panorama viewer. Four approaches are outlined below:
1. Model-based, no caching (e.g. QuickTime VR):
server transmits entire source panorama image to client
client computes projected views from this image and displays them
2. View-based, no caching:
when view changes, client tells server its view position,
server computes view image and transmits it to client.
client displays resulting image
3. Model-based, cached:
when view changes, client tells server new view position, and old view position(s) (or the server may cache information about prior views).
server determines which parts of original (unprojected) panorama image client does not have, and transmits those parts to client.
client composes new image parts with existing image parts to create a partial (unprojected) panorama image, then projects this image to create displayed image.
4. View-based, cached:
when view changes, client tells server new view position and old view position.
server creates projected images of panorama using the old and new views, and sends enough difference information to client for client to be able to construct an image of new view.
client creates new view by applying a transform to its old view, and then merging in difference information from server.
Approaches 1 and 2 do not scale well to large panoramas or low-bandwidth networks.
Approach 3 has not been explored for panoramas, and would be of interest. There are systems which allow the user to view a large image over a thinwire (e.g. [Chang, E., Yap, C., Yen, T. J., Realtime Visualization of Large Images over a Thinwire, IEEE Visualization 97 (Late Breaking Hot Topics), Tuscon, Ari., 1997, incorporated by reference herein]), which could be adapted for use in approach 3. The common theme is that the client builds on its existing knowledge of the source image, instructing the server on what areas of the panorama image to send according to user interactions. This technique works well in a state-based communication protocol where the client and the server both have knowledge of the underlying data representation used for the panorama image. One advantage of this approach is that the source image can be pre-processed and compressed heavily before communications is initiated, so very low bandwidth can be achieved. Another advantage is that the caching strategies involved are well understood.
However, in some cases, it is undesirable to transmit the original source image data over the net. For example, if the source data is very high resolution (perhaps stored in a database) then there may not be enough memory in the client to store the original source data. Also, in some cases, the data representation of the source image may not be suitable for transmitting over the netxe2x80x94for example, if the image represents a dynamic scene or complex 3D model. Finally, if the source data requires sophisticated rendering hardware, it may be desirable to perform the rendering on the server rather than the client, and utilize a high-performance graphics server.
For these scenarios, it is worthwhile to examine approach 4, which involves transmitting the view projected version of the source data, rather than the original source representation. This provides a clean separation between the view image that is seen remotely and the internal representation of the source data, and therefore it can scale to extremely large, complex or dynamic scenes. The approach is conceptually very simple, and a basic implementation is more straightforward than for approach 3, which requires careful thought about bookkeeping and caching strategies for panoramas. The system of the present invention uses approach 4.
One drawback to approach 4 is that, at first glance, it requires a great deal of network bandwidth to transmit the rendered scene to the client each frame. However, by utilizing frame-to-frame coherence, image reprojection, and compression, bandwidth requirements can be reduced significantly, making the approach viable for moderate bandwidth low-latency networks.
Image composition and morphine are well established as techniques for utilizing temporal coherence. See [R. Cook, Shade Trees, ACM SIGGRAPH ""84, July 1984, p223-231, incorporated by reference herein], [Shenchang E. Chen, QuickTime VRxe2x80x94an image based approach to virtual environment mapping, ACM SIGGRAPH ""95, August 1995, pp29-38, incorporated by reference herein]. Talisman [J. Toyborg, J. Kajiya, Talisman: Commodity Realtime 3D Graphics for the PC, SIGGRAPH ""95, (August 1995), pp. 39-46, incorporated by reference herein] is an example of a real time 3D graphics architecture that combines geometric and image-based methods, and uses a similar image coherence technique to reduce system bus bandwidth. [Jonathan Shade, Dani Lischinski, David Salesin, Tony DeRose, John Snyder. xe2x80x9cHierarchical Image Caching for Accelerated Walkthroughs of Complex Environments.xe2x80x9d SIGGRAPH 96. p.75-82, incorporated by reference herein] is another hybrid approach that uses temporal coherence to improve overall system performance.
The system of the present invention utilizes image reprojection to derive the predicted new view from the old view. Reprojection is also found in other image-based rendering systems, such as [McMillan, Leonard, and Gary Bishop. Plenoptic Modeling: An Image-Based Rendering System, Proceedings of SIGGRAPH 95, (Los Angeles, Calif.), Aug. 6-11, 1995, pp. 39-46, incorporated by reference herein]. (See [Kang, Sing Bing. A Survey of Image-Based Rendering Techniques. Technical Report 97/4. Digital Equipment Corporation Cambridge Research Laboratory, August 1997, incorporated by reference herein] for a survey of image-based rendering techniques).
[Mark, William R. and Gary Bishop. Efficient Reconstruction Techniques for Post-Rendering 3D Image Warping, UNC Computer Science Technical Report TR98-011, University of North Caroline, Mar. 21, 1998, incorporated by reference herein] and [Regan, M., and R. Pose, xe2x80x9cPriority Rendering with a Virtual Reality Address Recalculation Pipeline,xe2x80x9d Computer Graphics (SIGGRAPH 94 Conference Proceedings), 1994, incorporated by reference herein] use an image-based rendering approach to overcome high network latencies. The idea in these systems is to send image-based views once approximately every five frames, and let the client reproject for the intermediate frames. The system of the present invention, on the other hand, sends an image every frame, but attempts to lower bandwidth by reducing the amount of data sent for each frame. Hence, the system relies on a low-latency network, whereas [Mark, William R. and Gary Bishop. Efficient Reconstruction Techniques for Post-Rendering 3D Image Warping, UNC Computer Science Technical Report TR98-011, University of North Caroline, Mar. 21, 1998, incorporated by reference herein] can deal with network lag, though there will begin to be some degradation in the images (when the user moves to views that are not visible to the reference images sent by the server). An integration these two approaches would be of interest.
The system of the present invention reduces bandwidth requirements by using compression. The ImageServer software [Live Picture, Inc., http://www.livepicture.com/. incorporated by reference herein] also reduces network bandwidth requirements by using compression. It sends subimages or subpanoramas on demand, and provides a view-based approach for remote panoramas. However, the system does not support image differencing or real-time frame rates.
QuickTime VR 2.1 [Apple Computer, Inc., http://www.apple/com/quicktime, incorporated by reference herein] and the QuickTime Plug-In introduce streaming QTVR movies: QTVR movies which begin to appear as soon as they begin to download. This is accomplished with low-resolution preview tracks and by reordering the data in the movie so the tiles (for panoramas) or views (for objects) can be displayed as they are downloaded.
The view-based, cached approach can be though of as an image compression scheme, based on specialized a prior knowledge of the image properties. Compare, for example, the multiscale compression schemes [Burt, Peter, Adelson, Edward. The Laplacian pyramid as a compact image code. IEEE Transactions on Communications, 31(4):532-540, April 1983, incorporated by reference herein] and [Sweldens, Wim. The lifting scheme: A custom-design construction of biorthogonal wavelets, Technical Paper 1994:7, Industrial Mathematics Initiative, Department of Mathematics, University of South Carolina, 1994, incorporated by reference herein] which use prediction and difference. Our work enhances this with a cooperative client-server approach, similar to [Levoy, Marc. Polygon-Assisted JPEG and MPEG Compression of Synthetic Images. SIGGRAPH 95, http://graphics.stanford.edu/papers/poly, incorporated by reference herein].
The present invention pertains to a system for viewing a scene from a remote location. The system comprises a client machine. The system comprises a network connected to the client machine. The system comprises a server machine having a 3D environment stored in it. The server machine is connected to the network and remote from the client machine, wherein the client machine predicts a next view of the 3D environment based on a previous view of the 3D environment by the client machine, and the server machine predicts the next view also based on the previous view and sends to the client machine by way of the network only the difference between the predicted view and the previous view.
The present invention pertains to a method for viewing a scene from a remote location. The method comprises the steps of waiting for a mouse event signaling a view change. Then there is the step of sending a message to a server over a network from a client indicating previous and new view orientations. Next there is the step of reading a difference image from the server. Then there is the step of capturing a previous view. Then there is the step of drawing an old view into a new orientation. Next there is the step of blending in the difference image from the server over the new orientation.
The present invention pertains to a method for viewing a scene from a remote location. The method comprises the steps of waiting for a message from a client giving previous and new view orientations. Then there is the step of drawing a environment map at the previous view orientation. Next there is the step of capturing the previous view. Then there is the step of aligning the old view with a new view. Next there is the step of capturing the new view. Then there is the step of drawing an environment map at the new view orientation. Next there is the step of computing a difference image between the old view and the new view. Then there is the step of transmitting a difference image to the client. Next there is the step of forming the new view at the client based on the previous view and the difference image.
The present invention pertains to a system for creating a remote view of a panoramic scene. It utilizes dynamic image differencing, compression, image reprojection and a stateless communication protocol to reduce bandwidth requirements. The approach can be generalized to handle remote views of 3D scenes (by transmitting color and depth buffer values). This technique could be used to create multiple interactive views of dynamic 3D spaces which are rendered using a high performance 3D server (such as an Onyx Infinite Reality system) and are displayed using high frame rates, with low latency, and moderate bandwidth on desktop graphics platforms. The current implementation handles only static scenes using a cylindrical projection.