There is a continuing interest, within the computer graphics community, in image-based rendering (IBR) systems. These systems are fundamentally different from traditional geometry-based rendering systems, in that the underlying information (i.e., data representation) is composed of a set of photometric observations (e.g., digitized images/photographs) rather than being either mathematical descriptions of boundary regions or discretely sampled space functions.
An IBR system uses the set of photometric observations to generate or render different views of the environment and/or object(s) recorded therein. There are several advantages to this approach. First, the display algorithms for IBR systems tend to be less complex and may therefore be used to support real-time rendering in certain situations. Secondly, the amount of processing required to view a scene is independent of the scene's complexity. Thirdly, the final rendered image may include both real photometric objects and virtual objects.
IBR systems can be complex, however, depending upon the level of detail required and the processing time constraints. For example, Adelson et al., in their article entitled “The Plenoptic Function And The Elements Of Early Vision”, published in Computational Models of Visual Processing by The MIT Press, Cambridge, Mass. 1991, stated that a 7-dimensional plenoptic function can be implemented in an IBR system to completely represent a 3-dimensional dynamic scene. The 7-dimensional plenoptic function is generated by observing and recording the intensity of light rays passing through every space location as seen in every possible direction, for every wavelength, and at any time. Thus, imagine an idealized camera that can be placed at any point in space (Vx, Vy, Vz). This idealized camera can then be used to select any of the viewable rays by choosing an azimuth angle (θ) and elevation angle (φ), as well as a band of wavelengths (λ). Adding an additional parameter (t) for time, produces a 7-dimensional plenoptic function:p=P(θ, φ, λ, Vx, Vy, Vz, t)
Thus, given function p, to generate a view from a specific point in a particular direction, one need only to merely plug-in the values for (Vx, Vy, Vz) and select from a range of (θ, φ) for some constant t for each desired a band of wavelengths (λ).
Accomplishing this in real-time, especially for a full spherical map or a large portion thereof, is typically beyond most computer's processing capability. Thus, there was a need to reduce the complexity of such an IBR system to make it more practical.
By ignoring the time (t) and the wavelength (λ) parameters, McMillan and Bishop in their article entitled “Plenoptic Modeling: An Image-Based Rendering System” published in Computer Graphics Proceedings (SIGGRAPH'95) August 1995, disclosed a plenoptic modeling scheme that generates a continuous 5-dimensional plenoptic function from a set of discrete samples. Further research and development by Gortler et al., lead to the development of the Lumigraph as disclosed in an article entitled “The Lumigraph” that was published in Computer Graphics Proceedings (SIGGRAPH'96) in August, 1996. Similarly, Levoy et al. developed a Lightfield as disclosed in an article entitled “Light Field Rendering” that was also published in Computer Graphics Proceedings (SIGGRAPH'96) in August of 1996. The Lumigraph and the Lightfield presented a clever 4-dimensional parameterization of the plenoptic function provided the object (or conversely the camera view) is constrained within a bounding box.
In an article entitled “Rendering With Concentric Mosaic”, published in Computer Graphics Proceedings (SIGGRAPH'99) in August 1999, Shum & He introduced a COncentric Mosaic (COM) that reduced the plenoptic function to 3-dimensions by restricting the viewer's movement on a plane. This technique is described in co-pending, commonly assigned U.S. patent application Ser. No. 09/222488 entitled “Rendering With Concentric Mosaics.”
In the COM technique taught by Shum & He, a mosaic image represents a collection of consecutive slit images of the surrounding 3D scene taken in a direction tangent to a viewpoint on a circle on the aforementioned plane within the scene. In this manner, mosaic image data is generated for a plurality of concentric circles on the plane, hence the name, “concentric mosaic.” When a novel view on the plane is to be rendered, the COM technique considers the slit images within a stack of mosaic images of differing radiuses to determine how best to render the scene. This provides a powerful tool for conducting 3D walkthroughs of actual and/or virtual scenes.
The COM technique, however, tends to generate and require a significant amount of data. For example, let us assume that the mosaic image for each concentric circle is 240 pixels high by 1350 pixels long and that there are 320 concentric mosaic images generated to provide for adequate depth resolution within the scene. In this case, the resulting COM data would total nearly 300 mega-bytes (MB).
Storing and/or processing this amount of data can be a daunting task for many computers, especially when the walkthrough is to be displayed without significant or perceptible delays between rendered images. Moreover, transporting this amount of data, for example, over the Internet using a 56K baud modem is simply impractical.
As such, there has been a movement to compress the COM data, such that the COM techniques can be made readily available using current technology. For example, conventional vector quantization techniques have been used to compress the nearly 300 MB COM data down to 25 MB (about a 12:1 ratio). Unfortunately, a 25 MB data file requires about one hour to download using a 56K baud modem.
Since the data structure of concentric mosaics can be regarded as a video sequence with slowly panning camera motion, video compression techniques may be employed to compress the COM data. Here, for example, at least two major categories of video compression techniques may be considered useful. The first category includes conventional video compression standards, such as MPEGx and H.26x, which basically adopt a prediction-based framework, where the temporal redundancy across frames is reduced through motion compensation and block residue coding.
The first category would also include more recently developed techniques like the reference block coder (RBC) described by C. Zhang et al., in “Compression And Rendering Of Concentric Mosaic Scenery With Reference Block Coding,” presented in June 2000 at the SPIE Visual Communication and Image Processing (VCIP 2000) conference, which is incorporated herein, in its entirety, by reference.
The second category includes, three-dimensional (3D) wavelet video coders. Examples are described in articles by: D. Taubman et al., entitled “Multirate 3-D Subband Coding Of Video,” and J. R. Ohm, entitled “Three-Dimensional Subband Coding With Motion Compensation,” in IEEE Trans. On Image Processing, Vol. 3, No. 5, September 1994; A. Wang et al., entitled “3D Wavelet Coding Of Video With Global Motion Compensation,” presented March 1999 at Proc. DCC'99 in Snowbird, Utah; and, J. Y. Tham et al., entitled “Highly Scalable Wavelet-Based Video Codec For Low Bit-Rate Environment,” IEEE Journal on Selected Areas in Communications, Vol. 16, No. 1, January 1998.
Basically, these and other like 3D wavelet video coders present another category of video coding approaches that explore the temporal redundancy via temporal direction wavelet filtering. One attractive property of the 3D wavelet video coder is its spatial-temporal-quality scalability.
Here, the term scalability means that a 3D wavelet coder can compress video into a single bitstream, where multiple subsets of the bitstream can be decoded to generate complete videos of different spatial resolution/temporal resolution/quality commensurate with the proportion of the bitstream decoded. For more information see, e.g., “A Common Framework For Rate Distortion Based Scaling Of Highly Scalable Compressed Video,” by D. Taubman et al., IEEE Trans. On Circuits and Systems for Video Technology, Vol. 6, No. 4, August 1996.
Scalability is extremely useful in a data-streaming environment, such as the Internet, etc., where heterogeneous decoder/network settings prevail. Furthermore, since 3D wavelet based coders avoid the recursive loop that is present in most predictive coders, they tend to perform better in an error prone environment, such as a wireless network.
The second category would also include more recently developed data alignment techniques, for example, as described by L. Luo et al., in “Compression Of Concentric Mosaic Scenery With Alignment And 3D Wavelet Transform,” presented in January 2000 at the SPIE Image and Video Communications and Processing and Image Processing (SPIE 3974-10) conference in San Jose, Calif., and which is incorporated herein, in its entirety, by reference.
Based on these previous efforts, 3D wavelet transform coding systems have been developed to compress the COM data. The compression performance of such coders, however, could stand further refinement. As such, there is a need to determine is any performance bottlenecks exist and to further improve the compression performance of various 3D wavelet coders.
In a 3D wavelet coder, for example, a wavelet transform is applied separately along the horizontal, vertical and temporal directions to concentrate the signal energy into relatively few large coefficients. However, one common problem with conventional 3D wavelet compression schemes is that the temporal wavelet filtering does not always achieve efficient energy compaction.
In a prediction-based video/concentric mosaic coder, local motion can be specified on a per block basis. Consequently, inter-frame correlation due to the moving object/camera, for example, can be explored and made beneficial to the coding performance.
Unfortunately, local motion cannot be easily incorporated into the framework of conventional 3D wavelet compression schemes. Because of the transform nature of the temporal filtering, each pixel has to be engaged in one and only one transform. Taubman et al. have proposed a pan compensation module that aligns the image frames prior to the wavelet transform. In the wavelet concentric mosaic codec proposed by Luo et al., a panorama alignment module was used to eliminate global translation. Wang et al. proposed to register and warp all image frames into a common coordinate and then apply a 3D wavelet transform with arbitrary region of support to the warped volume. To make use of local block motion, Ohm incorporated block matching and carefully handled the covered/uncovered, connected/unconnected regions. By trading off an invertibility requirement, Tham et al. employed a block-based motion registration for the low motion sequences without filling the holes caused by individual block motion. Unfortunately, each of these various proposed approaches tends to be complex, and in particular those of Ohm and Tham et al. tend to be very complex.
M. Magnor and B. Girod have used 4D Haar wavelet for the coding of Lumigraph/Lightfield. For more information see, e.g., “Two approaches to incorporate approximate geometry into multiview image coding” presented in September 2000 at the IEEE International Conference on Image Processing (ICIP 2000) conference in Vancouver, BC and the “Model-based coding of multi-viewpoint imagery”, presented in June 2000 in Perth, Australia at the SPIE Visual Communications and Image Processing 2000(VCIP'2000). Their conclusion is that high dimensional wavelet coder is inferior in performance compared with a predictive based coder. However, as indicated in this patent, it can be observed that misalignment of data is the major cause of compression inefficiency.
Consequently, there is a need for further improved methods and arrangements for use in compressing IBR data, such as, e.g., COM data. Preferably, the methods and arrangements will support the scalability requirements associated with various devices and provide for efficient communication over various communication services.