The present invention relates to methods of apparatus for processing pictorial information to synthesize images from arbitrary viewpoints.
Ordinary image display systems such as a common television set or a computer screen with standard image display software provide monocular images from a viewpoint which is independent of the viewer""s actual position. When the viewer turns his or her head, the displayed image does not change. Rather, the image continually reflects the viewpoint of the camera which originally generated the video signal or an artificial viewpoint in the image display software. Common systems for displaying stereoscopic images suffer from the same problem. For example, some common stereoscopic vision systems display a separate video image to each eye of the viewer, each such image corresponding to a slightly different camera position or slightly different artificial viewpoint in the case of computer generated images. Here again, however, the viewpoints do not change as the observer moves. Such systems therefore do not provide a truly realistic viewing experience.
Holographic images inherently provide a more realistic viewing experience. A viewer looking at a hologram sees the depicted object from a new viewpoint if he or she moves his or her head to a new location, or turns it to a new viewing angle. In this respect, the experience of looking at a hologram resembles the experience of looking at the depicted objects in reality. However, it is generally impractical to display holographic images of changing scenes. Although some holographic video systems have been demonstrated, they are extremely expensive, require very large bandwidth and suffer from other drawbacks.
So-called xe2x80x9cvirtual realityxe2x80x9d systems can provide viewpoints which move as the observer moves his or her head. Some of these systems display computer generated images synthesized from mathematical models of the scene to be depicted. Such an image involves computation of the projection of the mathematically modeled elements of the scene onto an arbitrary view plane. To provide a stereoscopic view, two different viewing planes are used, corresponding to the slightly different viewing planes of the observer""s two eyes. Such systems can be provided with detectors for monitoring the actual orientation of the viewer and can be arranged to change the view planes used in the reconstruction as the orientation of the viewer changes. Such an arrangement theoretically can provide an illusion of presence in the scene. However, such systems are limited only to displaying images of mathematically generated scenes. Accordingly, they can only display images of synthetic, computer-created scenes or of real scenes which can be captured and modeled as mathematically tractable elements suitable for handling by computer graphics software. They cannot normally display images of an arbitrary scene. Moreover, such systems require substantial computational power to perform all of the complex mathematical manipulations required. This problem is aggravated where the scene includes moving elements.
An alternative arrangement has been to use an actual camera or cameras directed at the real scene. For a stereoscopic view, two cameras are employed, spaced apart from one another by distance corresponding to the viewer""s interpupillary distance. The cameras are mounted on a platform which in turn is linked to a servomechanism. The servomechanism is controlled by a sensor linked to the user""s head. As the user moves his or her head, the camera platform duplicates such movement. Accordingly, the images captured by the cameras and transmitted to the user""s eyes realistically duplicate the images which the user would see as he or she looks at the scene from any viewpoint. The system can provide a realistic experience of telepresence. The viewer sees essentially the same images as he or she would see if he were at the scene, and these images change in a realistic manner as the viewer""s head moves. These systems are expensive, in that a set of cameras and the associated servo mechanisms must be provided for each user. Moreover, these systems require that the scene be in existence and available for viewing at the time the viewer wants to see the scene. They cannot operate with recorded images of the scene. Moreover, there must be continuous, two-way communication between the viewer""s location and the real location of the scene, where the cameras are positioned. At least the communications channel from the scene location to the viewer""s location must be a high-band width video channel. All of these drawbacks together limit application of such servomechanism based systems to rare situations.
As described in an article by Takahashi et al, Generation Of Intermediate Parallax-images For Holographic Stereograms, Proceedings SPIE, Volume 1914, Practical Holography VII (1993) a so-called xe2x80x9cHolographic Stereogramxe2x80x9d can be synthesized from numerous individual monocular images of a scene, typically about 50 to 100 such images. To alleviate the need for actually capturing so many real images, the authors propose to generate intermediate images by projection back from three dimensional data defining the scene. The three dimensional data, in turn, is calculated from the images taken by real cameras at various locations on a linear camera locus. In this manner, the system is able to create intermediate images simulating the image which would be taken by a camera positioned between positions of real cameras. This system depends upon two-dimensional projection from three-dimensional data; i.e., calculation of the image which would appear in a viewing plane based upon data defining the location of objects in the scene in three dimensions. The system must determine the depth from the real cameras of each point in the scene.
To facilitate this determination, the authors propose to use certain characteristics of a so-called xe2x80x9cepipolar imagexe2x80x9d. As further described below, an epipolar image combines data from multiple cameras into partial images, each including part of the data from each camera. With conventional raster-scan video cameras, each portion of the epipolar image typically includes one scanning line from each camera of the multiple camera set. In such epipolar images, features appear as sloping strips or bands. The width and slope of the bands are related to the depth or distance between the actual feature and the camera locus. Moreover, it is possible to determine from the epipolar image which features in the scene occlude other features, i.e., which features lie to the front, closer to the cameras and in which features lie to the back. The authors thus propose to recover the depth of the various points in the image by using the epipolar image. That depth information, in turn, is used as part of three-dimensional data, which in turn is used to project a two-dimensional image simulating the two-dimensional image which would be captured by a camera at an intermediate location. This system nonetheless involves all of the computational complexity required to reconstruct two-dimensional images from three-dimensional images. Moreover, Takahashi et al characterize their system only as suitable for generation of the stereographic holograms, and not for generation of images to be viewed directly by a viewer.
Accordingly, despite all of this effort in the art, there still remains a substantial, unmet need for improved methods of synthesizing and displaying an image of a scene from an arbitrary, synthesized viewpoint. In particular, there are substantial, unmet needs for improved methods of providing telepresence, including display of images from different viewpoints as the users head moves in real time. In particular, there are needs for a telepresence system which can provide images to multiple users simultaneously.
The present invention addresses these needs.
One aspect of the invention provide methods of synthesizing an image of a scene corresponding to the image of said scene which would be observed from a virtual viewpoint location. The method includes the step of providing a plurality of starting pixel data elements. Preferably, each said starting pixel data element incorporates data corresponding to illumination seen along a starting pixel ray vector associated with that starting pixel data element. Each said starting pixel ray vector desirably has a direction and an intercept on a known locus in the frame of reference of the scene. The direction of a synthetic pixel ray vector from the pixel through the virtual viewpoint location, and the intercept of the same vector on the locus are determined. The idea of a synthetic pixel ray vector can be envisioned by imagining a camera having its lens at the virtual viewpoint. A vector from a given pixel in the image plane of the imaginary camera would correspond to a synthetic pixel ray vector for that pixel. Using the direction and intercept of the synthetic pixel ray vector, one or more starting pixel data elements are selected. The selected starting pixel data elements are those associated with starting pixel ray vectors having directions and intercepts close to the direction and intercept of the synthetic pixel ray vector. The method further includes the step of deriving synthetic pixel data for the pixel of the synthesized image from the starting pixel data of the selected starting pixel data element or elements. Where only one starting pixel data element is selected, the data of that element can simply be copied as the data of the synthetic pixel. Where plural starting pixel data elements are selected, the step of deriving the synthetic pixel data typically includes the step of interpolating between the data of the selected starting pixel data elements.
In a particularly preferred arrangement, the step of providing starting pixel data elements includes the step of providing a plurality of direction matrices. Each direction matrix includes starting pixel data elements associated with pixel ray vectors having directions parallel, within a preselected direction tolerance range, to a common direction. The step of selecting one or more starting pixel data elements for each synthetic pixel desirably includes the steps of selecting one or more of said direction matrices having a common direction close to the direction of the synthetic pixel ray vector and selecting those pixel data elements from the selected direction matrices which are associated with starting pixel ray vectors having intercepts close to the intercept of the synthetic pixel ray vector on the locus.
Stated another way, the step of providing starting pixel data elements desirably includes the step of ordering the starting pixel data elements in a multidimensional matrix having at least one dimension correlated with direction of the starting pixel ray vectors and at least one dimension correlated with the intercepts of said starting pixel ray vectors. The locus used as the frame of reference for the intercepts may be two-dimensional, such as a sphere or portion of a sphere having an azimuth or longitude direction and also having an elevation or latitude direction. In further methods according to this aspect of the invention, the step of providing starting pixel data elements includes the step of providing a plurality of discrete two-dimensional images corresponding to the image of a scene observed from a plurality of discrete viewpoints. Preferably, the viewpoints lie on the aforementioned locus. Each discrete image includes pixel data elements associated with pixels offset from one another in horizontal and vertical directions within that discrete image. The pixel data element associated with each pixel in a discrete image represents illumination seen along a starting pixel ray vector from that pixel through the viewpoint of that discrete image.
A further aspect of the invention provides methods of providing data defining an image of a scene. Methods according to this aspect of the invention desirably include the step of providing a plurality of starting pixel data elements. Here again, each starting pixel data element incorporates data corresponding to illumination seen along a starting pixel ray vector associated with that starting pixel data element. Each starting pixel ray vector has a direction and an intercept on a locus. The method further includes the step of forming the starting pixel data elements into a transform image including a plurality of direction matrices, each said direction matrix including pixel data elements associated with pixel ray vectors having directions parallel within a preselected direction tolerance range to a common ray direction. Within the transform image, the direction matrices can be provided in an ordered array so that the common ray direction of each direction matrix is implicit in the position of such direction matrix in said ordered array. Likewise, individual pixel data elements can be ordered within the direction matrix so that intercept of the pixel ray vector associated with each pixel data element is implicit in the position of that pixel data element in the direction matrix. One or more lookup tables may be provided to relate positions of pixel data elements in direction matrices to intercepts of pixel ray vectors, or to relate positions of direction matrices within the transform image to ray direction of the associated pixel ray vectors. The transform images provided according to this aspect of the invention can be used in the image synthesis methods discussed above.
The step of providing the pixel data elements desirably includes the step of actuating a plurality of pixel sensing elements receiving illumination directed in different physical sensing directions so that a set of pixel sensing elements receiving illumination in substantially parallel physical sensing directions are actuated concomitantly with one another to capture pixel data elements constituting each said direction matrix. In one arrangement, different sets of pixel sensing elements are actuated in order of the physical sensing directions of said sets, so that the physical sensing direction is scanned in a progressive sweep. The pixel sensing elements can be provided in a plurality of different arrays, such as in a plurality of cameras. The step of actuating the pixel sensing elements can be conducted so that the set of pixel sensing elements actuated concomitantly with one another to capture the pixel data elements constituting each direction matrix includes pixel sensing elements in a plurality of cameras. As further discussed below, the simple transposition from data acquired by pixel sensing elements to data elements within matrices lends itself to simple system architecture and rapid data acquisition.
The method may include the step of repeating the aforesaid steps so as to capture a time sequence of transform images representing a time sequence of scenes, i.e., a scene which changes with time. Also, the method may further include the step of compressing the data in a transform image to provide a compressed transform image. For example, the compressing step may include the step of comparing a plurality of direction matrices with one another, i.e., these methods can be used in providing telepresence. In a telepresence system, the step of selecting a virtual viewpoint includes the step of detecting the disposition of an observer, typically by detecting both the viewpoint or location of the observer and the viewing direction of the observer as the observer moves and selecting the virtual viewpoint so as to correspond to the viewpoint of the observer. Also, in a telepresence system, the method further includes the step of displaying the virtual viewpoint image to the observer substantially in a real time. That is, the steps of detecting the disposition of the observer, synthesizing a virtual viewpoint image and displaying that image are performed substantially in real time, as the observer moves, so that the observer sees the correct virtual viewpoint image for a new observer disposition as substantially immediately as the observer moves to the new disposition. For stereoscopic images, two virtual viewpoint images are generated for each observer disposition, these images being taken from slightly different virtual viewpoints corresponding to the dispositions of the observer""s eyes.
In methods according to the foregoing aspects of the invention, there is no need to reconstruct the full or there-dimensional scene, or to calculate a projection from full three-dimensional scene-specifying data onto a two-dimensional image plane. Indeed, as further discussed below, the manipulation of pixel data required to construct the virtual viewpoint image preferably includes only simple mapping of pixel data with some linear combinations or interpolations of pixel data. These steps can be carried out rapidly even where the images to be handled include large amounts of data as encountered in common video images. The system does not require any mathematical modeling or knowledge of the elements in the scene to be depicted. The starting pixel data can depict any scene, whether computer-generated or taken by a real cameras or some combination of the two. The starting pixel data need not be captured in real time during viewing. The data may be prerecorded in its original form, such as in discrete images or prerecorded in transform images. Further, the scene need not be static. Thus, the starting pixel data may be provided as sets, each such set incorporating pixel data captured at a given instant A separate transform image may be created for each such set, so as to provide a data set for a time series of scenes. Here again, the step of creating the transform images need not include any complex, three-dimensional projection, but may instead may include simple concatenation of pixel data. Thus, methods according to this aspect of the present invention can be applied to provide telepresence in a dynamic environment, i.e., the illusion that the observer is actually present in a time series of scenes including moving objects. The observer sees both motion of the objects and apparent motion caused by movement of his or her viewpoint.
Still further aspects of the invention provide methods of modifying a data set, defining a first scene or time sequence of scenes, such as a transform images or series of images as discussed above, so as to provide an altered scene or altered time sequence of telepresence scenes. The method may include the step of altering the data in said set defining said first time sequence so that the alteration changes progressively. As further discussed below, the alteration may initially affect only a small number of pixel data elements, and hence may affect only a small region of the observer""s environment in a telepresence system. The number of pixel data elements affected by the alteration may be increased progressively in later scenes of the sequence so that the alteration appears to may spread progressively to a larger region. The direction matrix data structure discussed above facilitates this progressive alteration. Alternatively or additionally, the degree of alteration of particular pixel data elements, or of all of the pixel data elements in a scene may increase progressively. For example, all of the pixel data elements may vary progressively from the data associated with one scene or sequence of scenes to the data associated with another scene or series of scenes. According to further aspects of the invention, the output image displayed to the observer may be altered progressively. According to still other methods, the data defining discrete images used as input to the methods discussed above may be altered.
A further aspect of the present invention incorporates the realization that data arranged in the transform images discussed above can be compressed and stored or transmitted in compressed form, and then subsequently decompressed for use in image synthesis steps as described above. It is advantageous to store and transmit the data in the form of compressed transform images, such, and then decompress the transform images.
These and other objects, features and advantages of the present invention will be more readily apparent from the detailed description of the preferred embodiments set forth below, taken in conjunction with the accompanying drawings.