The present invention relates to methods of apparatus for processing pictorial information to synthesize images from arbitrary viewpoints.
Ordinary image display systems such as a common television set or a computer screen with standard image display software provide monocular images from a viewpoint which is independent of the viewer's actual position. When the viewer turns his or her head, the displayed image does not change. Rather, the image continually reflects the viewpoint of the camera which originally generated the video signal or an artificial viewpoint in the image display software. Common systems for displaying stereoscopic images suffer from the same problem. For example, some common stereoscopic vision systems display a separate video image to each eye of the viewer, each such image corresponding to a slightly different camera position or slightly different artificial viewpoint in the case of computer generated images. Here again, however, the viewpoints do not change as the observer moves. Such systems therefore do not provide a truly realistic viewing experience.
Holographic images inherently provide a more realistic viewing experience. A viewer looking at a hologram sees the depicted object from a new viewpoint if he or she moves his or her head to a new location, or turns it to a new viewing angle. In this respect, the experience of looking at a hologram resembles the experience of looking at the depicted objects in reality. However, it is generally impractical to display holographic images of changing scenes. Although some holographic video systems have been demonstrated, they are extremely expensive, require very large bandwidth and suffer from other drawbacks.
So-called "virtual reality" systems can provide viewpoints which move as the observer moves his or her head. Some of these systems display computer generated images synthesized from mathematical models of the scene to be depicted. Such an image involves computation of the projection of the mathematically modeled elements of the scene onto an arbitrary view plane. To provide a stereoscopic view, two different viewing planes are used, corresponding to the slightly different viewing planes of the observers two eyes. Such systems can be provided with detectors for monitoring the actual orientation of the viewer and can be arranged to change the view planes used in the reconstruction as the orientation of the viewer changes. Such an arrangement theoretically can provide an illusion of presence in the scene. However, such systems are limited only to displaying images of mathematically generated scenes. Accordingly, they can only display images of synthetic, computer-created scenes or of real scenes which can be captured and modeled as mathematically tractable elements suitable for handling by computer graphics software. They cannot normally display images of an arbitrary scene. Moreover, such systems require substantial computational power to perform all of the complex mathematical manipulations required. This problem is aggravated where the scene includes moving elements.
An alternative arrangement has been to use an actual camera or cameras directed at the real scene. For a stereoscopic view, two cameras are employed, spaced apart from one another by distance corresponding to the viewer's interpupillary distance. The cameras are mounted on a platform which in turn is linked to a servomechanism. The servomechanism is controlled by a sensor linked to the user's head. As the user moves his or her head, the camera platform duplicates such movement. Accordingly, the images captured by the cameras and transmitted to the user's eyes realistically duplicate the images which the user would see as he or she looks at the scene from any viewpoint. The system can provide a realistic experience of telepresence. The viewer sees essentially the same images as he or she would see if he were at the scene, and these images change in a realistic manner as the viewer's head moves. These systems are expensive, in that a set of cameras and the associated servo mechanisms must be provided for each user. Moreover, these systems require that the scene be in existence and available for viewing at the time the viewer wants to see the scene. They cannot operate with recorded images of the scene. Moreover, there must be continuous, two-way communication between the viewer's location and the real location of the scene, where the cameras are positioned. At least the communications channel from the scene location to the viewer's location must be a high-band width video channel. All of these drawbacks together limit application of such servomechanism based systems to rare situations.
As described in an article by Takahashi et al, Generation Of Intermediate Parallax-images For Holographic Stereograms, Proceedings SPIE, Volume 1914, Practical Holography VII (1993) a so-called "Holographic Stereogram" can be synthesized from numerous individual monocular images of a scene, typically about 50 to 100 such images. To alleviate the need for actually capturing so many real images, the authors propose to generate intermediate images by projection back from three dimensional data defining the scene. The three dimensional data, in turn, is calculated from the images taken by real cameras at various locations on a linear camera locus. In this manner, the system is able to create intermediate images simulating the image which would be taken by a camera positioned between positions of real cameras. This system depends upon two-dimensional projection from three-dimensional data; i.e., calculation of the image which would appear in a viewing plane based upon data defining the location of objects in the scene in three dimensions. The system must determine the depth from the real cameras of each point in the scene.
To facilitate this determination, the authors propose to use certain characteristics of a so-called "epipolar image". As further described below, an epipolar image combines data from multiple cameras into partial images, each including part of the data from each camera. With conventional raster-scan video cameras, each portion of the epipolar image typically includes one scanning line from each camera of the multiple camera set. In such epipolar images, features appear as sloping strips or bands. The width and slope of the bands are related to the depth or distance between the actual feature and the camera locus. Moreover, it is possible to determine from the epipolar image which features in the scene occlude other features, i.e., which features lie to the front, closer to the cameras and in which features lie to the back. The authors thus propose to recover the depth of the various points in the image by using the epipolar image. That depth information, in turn, is used as part of three-dimensional data, which in turn is used to project a two-dimensional image simulating the two-dimensional image which would be captured by a camera at an intermediate location. This system nonetheless involves all of the computational complexity required to reconstruct two-dimensional images from three-dimensional images. Moreover, Takahashi et al characterize their system only as suitable for generation of the stereographic holograms, and not for generation of images to be viewed directly by a viewer.
Accordingly, despite all of this effort in the art, there still remains a substantial, unmet need for improved methods of synthesizing and displaying an image of a scene from an arbitrary, synthesized viewpoint. In particular, there are substantial, unmet needs for improved methods of providing telepresence, including display of images from different viewpoints as the users head moves in real time. In particular, there are needs for a telepresence system which can provide images to multiple users simultaneously.