This invention relates to image-based rendering, and more particularly to methods and arrangements for compressing and transporting image-based rendering (IBR) data using alignment and three-dimensional (3D) wavelet transform techniques, and selectively decompressing portions of the resulting compressed data to render various two-dimensional (2D) views of a 3D scene.
There is a continuing interest, within the computer graphics community, in image-based rendering (IBR) systems. These systems are fundamentally different from traditional geometry-based rendering systems, in that the underlying information (i.e., data representation) is composed of a set of photometric observations (e.g., digitized images/photographs) rather than being either mathematical descriptions of boundary regions or discretely sampled space functions.
An IBR system uses the set of photometric observations to generate or render different views of the environment and/or object(s) recorded therein. There are several advantages to this approach. First, the display algorithms for IBR systems tend to be less complex and may therefore be used to support real-time rendering in certain situations. Secondly, the amount of processing required to view a scene is independent of the scene""s complexity. Thirdly, the final rendered image may include both real photometric objects and virtual objects.
IBR systems can be complex, however, depending upon the level of detail required and the processing time constraints. For example, Adelson et al., in their article entitled xe2x80x9cThe Plenoptic Function And The Elements Of Early Visionxe2x80x9d, published in Computational Models of Visual Processing by The MIT Press, Cambridge, Mass. 1991, stated that a 7-dimensional plenoptic function can be implemented in an IBR system to completely represent a 3-dimensional dynamic scene. The 7-dimensional plenoptic function is generated by observing and recording the intensity of light rays passing through every space location as seen in every possible direction, for every wavelength, and at any time. Thus, imagine an idealized camera that can be placed at any point in space (Vx, Vy, Vz). This idealized camera can then be used to select any of the viewable rays by choosing an azimuth angle (xcex8) and elevation angle (xcfx86), as well as a band of wavelengths (xcex). Adding an additional parameter (t) for time, produces a 7-dimensional plenoptic function:
p=P(xcex8, xcfx86, xcex, Vx, Vy, Vz, t)
Thus, given function p, to generate a view from a specific point in a particular direction, one need only to merely plug-in the values for (Vx, Vy, Vz) and select from a range of (xcex8, xcfx86) for some constant t for each desired a band of wavelengths (xcex).
Accomplishing this in real-time, especially for a full spherical map or a large portion thereof, is typically beyond most computer""s processing capability. Thus, there was a need to reduce the complexity of such an IBR system to make it more practical.
By ignoring the time (t) and the wavelength (xcex) parameters, McMillan and Bishop in their article entitled xe2x80x9cPlenoptic Modeling: An Image-Based Rendering Systemxe2x80x9d published in Computer Graphics Proceedings (SIGGRAPH""95) August 1995, disclosed a plenoptic modeling scheme that generates a continuous 5-dimensional plenoptic function from a set of discrete samples. Further research and development by Gortler et al., lead to the development of the Lumigraph as disclosed in an article entitled xe2x80x9cThe Lumigraphxe2x80x9d that was published in Computer Graphics Proceedings (SIGGRAPH""96) in August, 1996. Similarly, Levoy et al. developed a Lightfield as disclosed in an article entitled xe2x80x9cLight Field Renderingxe2x80x9d that was also published in Computer Graphics Proceedings (SIGGRAPH""96) in August of 1996. The Lumigraph and the Lightfield presented a clever 4-dimensional parameterization of the plenoptic function provided the object (or conversely the camera view) is constrained within a bounding box.
In an article entitled xe2x80x9cRendering With Concentric Mosaicxe2x80x9d, published in Computer Graphics Proceedings (SIGGRAPH""99) in August 1999, Shum and He introduced a COncentric Mosaic (COM) that reduced the plenoptic function to 3-dimensions by restricting the viewer""s movement on a plane. This technique is described in co-pending, commonly assigned U.S. patent application Ser. No. 09/222488 entitled xe2x80x9cRendering With Concentric Mosaics.xe2x80x9d
In the COM technique taught by Shum and He, a mosaic image represents a collection of consecutive slit images of the surrounding 3D scene taken in a direction tangent to a viewpoint on a circle on the aforementioned plane within the scene. In this manner, mosaic image data is generated for a plurality of concentric circles on the plane, hence the name, xe2x80x9cconcentric mosaic.xe2x80x9d When a novel view on the plane is to be rendered, the COM technique considers the slit images within a stack of mosaic images of differing radiuses to determine how best to render the scene. This provides a powerful tool for conducting 3D walkthroughs of actual and/or virtual scenes.
The COM technique, however, tends to generate and require a significant amount of data. For example, let us assume that the mosaic image for each concentric circle is 240 pixels high by 1350 pixels long and that there are 320 concentric mosaic images generated to provide for adequate depth resolution within the scene. In this case, the resulting COM data would total nearly 300 mega-bytes (MB).
Storing and/or processing this amount of data can be a daunting task for many computers, especially when the walkthrough is to be displayed without significant or perceptible delays between rendered images. Moreover, transporting this amount of data, for example, over the Internet using a 56K baud modem is simply impractical.
As such, there has been a movement to compress the COM data, such that the COM techniques can be made readily available using current technology. For example, conventional vector quantization techniques have been used to compress the nearly 300 MB COM data down to 25 MB (about a 12:1 ratio). Unfortunately, a 25 MB data file requires about one hour to download using a 56K baud modem.
In a further example, each mosaic image can be compressed using a JPEG coder or similar still image encoders. However, JPEG coding tends to be very inefficient since each COM scene consists of multiple highly correlated images.
Alternatively, a COM scene can be compressed with an MPEG or like video coder. Unfortunately, the MPEG encoding technique is also impractical, because it does not provide for random (selective) access to portions of the COM data during rendering. Moreover, although the peak signal-to-noise ratio (PSNR) performance of the MPEG coder may be satisfactory, a resulting COM scene encoded as MPEG may lack sufficient visual quality, because the MPEG standard is optimized for image streams that are played continuously, while the COM scene is essentially viewed statically.
Consequently, there is a need for methods and arrangements that can be used to reduce the amount of data, such as, e.g., COM data, required to be generated, stored, transported, or otherwise accessed in rendering a scene. Preferably, the methods and arrangements will support the scalability requirements associated with various devices and provide for efficient communication over various communication services.
Methods and arrangements are provided for substantially reducing the amount of data, such as, e.g., COM data, required to be generated, stored, transported, or otherwise accessed in rendering a three-dimensional (3D) scene. The methods and arrangements compress image-based rendering (IBR) data using alignment and 3D wavelet transform techniques. The compressed data can be easily transported and portions can be selectively decompressed to render various two-dimensional (2D) views of the 3D scene. Thus, the methods and arrangements can support the scalability requirements associated with many different devices and can be adapted for different communication services.
By way of example, an arrangement is provided for compressing and transporting image-based rendering (IBR) data using alignment and three-dimensional (3D) wavelet transform techniques, and selectively decompressing portions of the resulting compressed data to support rendering of desired views of a 3D scene. Here, a compression engine compresses the IBR data using a 3D wavelet transform and outputs a compressed bitstream that includes encoded frequency coefficients associated with the IBR data. This compressed bitstream is then provided (e.g., transported, etc.) to a decompression engine that selectively decodes portions of the compressed bitstream based on an access request for image data associated with the desired view from a rendering engine. The decompression engine decompresses the decoded portions using an inverse wavelet transform, and provides the decompressed IBR data to the rendering engine. The rendering engine is therefore able to render the decompressed IBR data without having to have the entire IBR bitstream decoded and decompressed at any one time.