There is a continuing interest, within the computer graphics community, in image-based rendering (IBR) systems. These systems are fundamentally different from traditional geometry-based rendering systems, in that the underlying information (i.e., data representation) is composed of a set of photometric observations (e.g., digitized images/photographs) rather than being either mathematical descriptions of boundary regions or discretely sampled space functions.
An IBR system uses the set of photometric observations to generate or render different views of the environment and/or object(s) recorded therein. There are several advantages to this approach. First, the display algorithms for IBR systems tend to be less complex and may therefore be used to support real-time rendering in certain situations. Secondly, the amount of processing required to view a scene is independent of the scene's complexity. Thirdly, the final rendered image may include both real photometric objects and virtual objects.
IBR systems can be complex, however, depending upon the level of detail required and the processing time constraints. For example, Adelson et al., in their article entitled “The Plenoptic Function And The Elements Of Early Vision”, published in Computational Models of Visual Processing by The MIT Press, Cambridge, Mass. 1991, stated that a 7-dimensional plenoptic function can be implemented in an IBR system to completely represent a 3-dimensional dynamic scene. The 7-dimensional plenoptic function is generated by observing and recording the intensity of light rays passing through every space location as seen in every possible direction, for every wavelength, and at any time. Thus, imagine an idealized camera that can be placed at any point in space (Vx, Vy, Vz). This idealized camera can then be used to select any of the viewable rays by choosing an azimuth angle (θ) and elevation angle (φ), as well as a band of wavelengths (λ). Adding an additional parameter (t) for time produces a 7-dimensional plenoptic function:p=P(θ, φ, λ, Vx, Vy, Vz, t)
Thus, given function p, to generate a view from a specific point in a particular direction, one need only to merely plug-in the values for (Vx, Vy, Vz) and select from a range of (θ, φ) for some constant t for each desired a band of wavelengths (λ).
Accomplishing this in real-time, especially for a full spherical map or a large portion thereof, is typically beyond most computer's processing capability. Thus, there has been a need to reduce the complexity of such an IBR system to make it more practical.
By ignoring the time (t) and the wavelength (λ) parameters, McMillan and Bishop in their article entitled “Plenoptic Modeling: An Image-Based Rendering System” published in Computer Graphics (SIGGRAPH'95) August 1995, disclosed a plenoptic modeling scheme that generates a continuous 5-dimensional plenoptic function from a set of discrete samples.
Further research and development by Gortler et al. led to the development of the Lumigraph as disclosed in an article entitled “The Lumigraph” that was published in Computer Graphics (SIGGRAPH'96) in August, 1996. Similarly, Levoy et al. developed a Lightfield as disclosed in an article entitled “Light Field Rendering” that was also published in Computer Graphics (SIGGRAPH'96) in August of 1996.
The Lumigraph and the Lightfield presented a clever 4-dimensional parameterization of the plenoptic function provided the object (or conversely the camera view) is constrained, for example, within a bounding box. As used herein, the term “Lumigraph” is used generically to refer to Lumigraph, Lightfield, and other like applicable plenoptic function based techniques.
By placing the object in its bounding box (e.g., a six-sided cube) which is surrounded by a larger box (e.g., a larger six-sided cube), the Lumigraph indexes all possible light rays from the object through the coordinates that the rays enter and exit one of the parallel planes of the double bounding boxes. Thus, in the case of a six-sided cube, the resulting Lumigraph data is thus composed of six 4-dimensional functions that can be discretized more precisely for the inner bounding box closest to the object, and more coarsely for the outer bounding box.
Even though many IBR scenes are synthetic, it is possible to capture the Lumigraph/Lightfield of a realistic scene/objects. There are some technical challenges, however, e.g., maintaining motion control of the camera array so that pictures can be taken from regular grid points of a plane parallel to the image plane.
In “Rendering With Concentric Mosaics”, Computer Graphics (SIGGRAPH'96), pp. 31, August 1996, Shum et al. proposed the use of concentric mosaics, which employ a 3D plenoptic function that restricts viewer movement inside a planar circle and looking outwardly.
A concentric mosaic scene can be constructed very easily, for example, by rotating a single camera at the end of a horizontal beam with the camera pointing outwardly and shooting images as the beam rotates. At the time of the rendering, one can then just split the view into vertical ray slits and reconstruct each slit through similar slits captured during the rotation of the camera.
Compared with a top-notch graphic rendering algorithm, such as, e.g., ray tracing, concentric mosaic techniques can render a scene realistically and also fast, regardless of the complexity of the scene. Unfortunately, the amount of data required for such concentric mosaic techniques is significantly large. By way of example, certain exemplary concentric mosaic scenes include 1,351 frames, each of which have a resolution of 320×240 pixels; thereby occupying a total of about 297 megabytes. Hence, the use of data compression is essential in concentric mosaics.
Thus, there is a need for methods and arrangements that can provide sufficiently large enough compression ratios, because of the amount of data. Fortunately, as a result of the image capturing techniques, for example, there is typically a high correlation within the resulting 3D dataset of the concentric mosaics.
Preferably, the methods and arrangements will allow for the rendering of a portion of the concentric mosaics without requiring the entire 3D dataset. In fact, each time a view is rendered it would be beneficial if only a small portion of the 3D dataset is used by the rendering mechanism.
Further, to save on system costs it would also be useful for the methods and arrangements to reduce the amount of memory used in rendering the concentric mosaics.
For example, a well designed concentric mosaic codec that allows portions of the 3D dataset to be randomly accessed and decoded from a compressed concentric mosaic bitstream would be useful. Such methods and arrangements could provide just-in-time (JIT) rendering; e.g., where only the content needed for the rendering of a current view is processed. Preferably, the JIT rendering techniques should be reasonably fast enough for use with conventional computing systems and like devices.
In the past, a spatial domain vector quantization (SVQ) has been proposed to compress concentric mosaics. See, e.g., Shum et al., supra. Some of the advantages of SVQ are that the bitstream compressed by SVQ can be decoded relatively fast, and the compressed SVQ index is easily accessible at arbitrary locations. However, SVQ is complex at the encoding stage. Furthermore, the compression ratio of SVQ is relatively low. For example, the SVQ proposed by Shum et al., supra, only achieves a compression ratio of about 12:1.
One may also compress each individual shot of the concentric mosaics using baseline JPEG or JPEG 2000 techniques. Because correlation between multiple shots is not used, however, the use of a still image coder may not be the most efficient. Moreover, during the rendering of new views, the concentric mosaic 3D dataset is accessed by slits rather than images. As such, a bit stream formed by the concatenation of individual compressed images may not be very efficient to access.
Video-based codecs such as those used by MPEG techniques provide another possible choice for use in compressing concentric mosaics. See, e.g., Mitchell et al., “MPEG Video: Compression Standard”, Chapman & Hall, 1996.
MPEG typically achieves a very high compression ratio by exploring the redundancy in neighboring frames. However, an MPEG decoder is designed to access the compressed images sequentially and does not support random access. Consequently, MPEG techniques may not be practical for use with concentric mosaics.
A 3D wavelet approach has also been proposed for the compression of the concentric mosaics. See, e.g., co-pending U.S. patent application Ser. No. 09/535,059, filed on Mar. 24, 2000, entitled “Methods And Arrangements For Compressing Image Based Rendering Data Using Multiple Reference Frame Prediction Techniques That Support Just-In-Time Rendering Of An Image”, now U.S. Pat, No. 6,693,964.
The 3D wavelet algorithm achieves a good compression ratio. The 3D wavelet algorithm also provides the ability to access a portion(s) of the compressed bitstream, but perhaps with an initial reduced resolution and quality. Nevertheless, such resolution and quality scalability capabilities may prove very useful in an Internet or like environment. Unfortunately, while 3D wavelet techniques provide for high compression ratios, these techniques may not be feasible for use in many conventional computers and like devices.
Thus, even though the MPEG and 3D wavelet codecs can achieve good compression efficiency, substantial computation resources must be devoted to decode the concentric mosaic data in real time.
Alternatively, one may attempt to pre-decode the entire bitstream and then render on the resulting decoded dataset. Within a client-server or like environment, however, this technique would probably not be feasible because it tends to introduce significantly long delays at the beginning and also because it would require a substantially large amount of memory in which to hold the entire decoded environment.
Therefore, there is a need for new methods and arrangements that provide good compression performance, yet at the same time enable the view of the environment to be accessed and rendered in real time, with minimum memory support.