This invention relates generally to computers and, more particularly, to methods and arrangements that can be implemented to compress image-based rendering (IBR) information, transport the compressed IBR information, and subsequently provide selective and/or just in time (JIT) rendering of an image based rendering scene on a portion of the compressed IBR information.
There is a continuing interest, within the computer graphics community, in image-based rendering (IBR) systems. These systems are fundamentally different from traditional geometry-based rendering systems, in that the underlying information (i.e., data representation) is composed of a set of photometric observations (e.g., digitized images/photographs) rather than being either mathematical descriptions of boundary regions or discretely sampled space functions.
An IBR system uses the set of photometric observations to generate or render different views of the environment and/or object(s) recorded therein. There are several advantages to this approach. First, the display algorithms for IBR systems tend to be less complex and may therefore be used to support real-time rendering in certain situations. Secondly, the amount of processing required to view a scene is independent of the scene""s complexity. Thirdly, the final rendered image may include both real photometric objects and virtual objects.
IBR systems can be complex, however, depending upon the level of detail required and the processing time constraints. For example, Adelson et al., in their article entitled xe2x80x9cThe Plenoptic Function And The Elements Of Early Visionxe2x80x9d, published in Computational Models of Visual Processing by The MIT Press, Cambridge, Mass. 1991, stated that a 7-dimensional plenoptic function can be implemented in an IBR system to completely represent a 3-dimensional dynamic scene. The 7-dimensional plenoptic function is generated by observing and recording the intensity of light rays passing through every space location as seen in every possible direction, for every wavelength, and at any time. Thus, imagine an idealized camera that can be placed at any point in space (Vx, Vy, Vz). This idealized camera can then be used to select any of the viewable rays by choosing an azimuth angle (xcex8) and elevation angle (xcfx86), as well as a band of wavelengths (xcex). Adding an additional parameter (t) for time produces a 7-dimensional plenoptic function:
p=P(xcex8, xcfx86, xcex, Vx, Vy, Vz, t)
Thus, given function p, to generate a view from a specific point in a particular direction, one need only to merely plug-in the values for (Vx, Vy, Vz) and select from a range of (xcex8, xcfx86) for some constant t for each desired a band of wavelengths (xcex).
Accomplishing this in real-time, especially for a full spherical map or a large portion thereof, is typically beyond most computer""s processing capability. Thus, there has been a need to reduce the complexity of such an IBR system to make it more practical.
By ignoring the time (t) and the wavelength (xcex) parameters, McMillan and Bishop in their article entitled xe2x80x9cPlenoptic Modeling: An Image-Based Rendering Systemxe2x80x9d published in Computer Graphics (SIGGRAPH""95) August 1995, disclosed a plenoptic modeling scheme that generates a continuous 5-dimensional plenoptic function from a set of discrete samples.
Further research and development by Gortler et al., lead to the development of the Lumigraph as disclosed in an article entitled xe2x80x9cThe Lumigraphxe2x80x9d that was published in Computer Graphics (SIGGRAPH""96) in August, 1996. Similarly, Levoy et al. developed a Lightfield as disclosed in an article entitled xe2x80x9cLight Field Renderingxe2x80x9d that was also published in Computer Graphics (SIGGRAPH""96) in August of 1996.
The Lumigraph and the Lightfield presented a clever 4-dimensional parameterization of the plenoptic function provided the object (or conversely the camera view) is constrained, for example, within a bounding box. As used herein, the term xe2x80x9cLumigraphxe2x80x9d is used generically to refer to Lumigraph, Lightfield, and other like applicable plenoptic function based techniques.
By placing the object in its bounding box (e.g., a six-sided cube) which is surrounded by a larger box (e.g., a larger six-sided cube), the Lumigraph indexes all possible light rays from the object through the coordinates that the rays enter and exit one of the parallel planes of the double bounding boxes. Thus, in the case of a six-sided cube, the resulting Lumigraph data is thus composed of six 4-dimensional functions that can be discretized more precisely for the inner bounding box closest to the object, and more coarsely for the outer bounding box.
In the examples that follow, the bounding box and larger box are assumed to be six-sided cubes, wherein the plane of the inner box which is being considered is indexed with coordinates (u, v) and that the corresponding plane of the outer box is indexed with coordinates (s, t).
Alternatively, the Lumigraph could be considered as six 2-dimensional image arrays, with all the light rays coming from a fixed (s, t) coordinate forming one image, which is equivalent to setting a camera at coordinate (s, t) and taking a picture of the object where the imaging plane is the (u, v) plane.
In either case, a plurality of Lumigraph images can be taken to produce a Lumigraph image array. Since neighboring Lumigraph images within the array will tend to be very similar to one another, to create a new view of the object, the IBR system can simply split the view into its light rays by interpolating nearby existing light rays in the Lumigraph image arrays.
In this manner, the Lumigraph is attractive because it has information of all views of the objects/scenes. With the Lumigraph, a scene can be rendered realistically regardless of the scene complexity and fast as compared with a top-notch graphic rendering algorithm such as ray tracing algorithm.
Unfortunately, the Lumigraph typically requires a very large amount of data. For example, a typical Lumigraph scene may include 32 sample points in each axis on the (s, t) plane, and 256 sample points in each axis on the (u, v) plane, with 3 color samples per light ray (e.g., 8-bits of red data, 8-bits of green data, and 8-bits of blue data), and 6 parallel image planes of the object. Thus, for such a relatively low resolution Lumigraph (note that the object resolution is that of the (u, v) plane, which is only 256xc3x97256 sample points), the total raw data amount is:
Total Lumigraph Data=32xc3x9732xc3x97256xc3x97256xc3x973xc3x976=1.125 GB.
Such a large Lumigraph data file would be impracticable for storage on a hard drive, optical disc, etc., or for transmission over a communication network, such as, for example, the Internet. Moreover, practical Lumigraph applications will likely require better resolution through a higher sampling density, which would result in even larger Lumigraph data files.
Consequently, there is an on-going need to reduce the size of the Lumigraph data file. One method is to compress the Lumigraph data. Since the Lumigraph data consists of an array of images, therefore, one might think that compression techniques that have been successfully applied to video coding might be applicable to provide Lumigraph data compression. Unfortunately, this is not necessarily so, because there are distinct differences between video and the Lumigraph. For example, the Lumigraph is a 2-dimensional image array, while video is a 1-dimensional array (i.e., a sequence of frames). Thus, there tends to be more of a correlation in the Lumigraph than in the video sequences. Furthermore, unlike video, views rendered using the Lumigraph tend to be more static as presented to the viewer. As is well known, for most viewers, distortion is more noticeable in static images than in moving images. Since a rendered view of the Lumigraph is a combination of the image arrays, certain human visual system (HVS) properties, such as, spatial and temporal masking, may not be used.
Another difference can be seen during the rendering of a compressed bitstream. For a compressed video bitstream, the bitstream is decompressed allowing it to be displayed frame by frame. To the contrary, a compressed Lumigraph bitstream would not be decompressed and then rendered in such a manner, because the decompressed Lumigraph data file would tend to be too large.
It is therefore essential to maintain the Lumigraph data in the compressed form, and decompress/decode only the content needed to render the current view. As used herein, this concept will be referred to as xe2x80x9cjust-in-timexe2x80x9d (JIT) rendering.
JIT rendering is an important feature to the design of a practical Lumigraph compression scheme. Preferably, the JIT rendering will be accomplished by a Lumigraph decoder that is designed to be sufficiently fast enough to accommodate real-time decompression/decoding of the Lumigraph data.
One potential way to accommodate JIT rendering is to compress the Lumigraph data using intraframe coding. Here, the Lumigraph data is segmented into blocks that are compressed independent of one another. For example, Levoy et al. proposed a vector quantization (VQ) approach to compress the Lightfield, and Sloan et al. proposed to use JPEG (i.e., a block discrete cosine transform (DCT) function with run-level Huffman coding) to compress the Lumigraph.
While both VQ and JPEG techniques are relatively fast during decoding, the compression performance is limited. For example, the resulting image quality appears acceptable at a low compression ratio of between about 25:1 and 50:1, however, the quality of the rendered scene degrades quickly thereafter for compression ratios higher than about 50:1.
Considering the large amount of Lumigraph data and high redundancy of information contained therein, there is a continuing need for improved IBR compression methods and arrangements.
Recently, at least two articles have proposed the use of an MPEG like algorithm to compress the Lumigraph data array. The first article, written by Kiu et al., is entitled xe2x80x9cTwo-Dimensional Sequence Compression Using MPEGxe2x80x9d and was published in Visual Communication And Image Processing (VCIP""98) in January 1998. The second article, written by Magnor et al., is entitled xe2x80x9cAdaptive Block-Based Light Field Coding,xe2x80x9d and was published in the Proc. 3rd International Workshop on Synthetic and Natural Hybrid Coding and Three-Dimensional Imaging IWSNHC3DI""99 in September 1999. While each of these articles presents a compression technique that appears to provide higher compression ratios, neither article addresses the continuing problem of rendering the compressed Lumigraph scene, which as described above is of crucial importance to the overall Lumigraph application.
Consequently, there is a need for improved methods and arrangements that can be implemented to compress IBR data, store and/or transport the compressed IBR data, and subsequently provide selective and/or JIT rendering of an image based on at least a portion of the compressed IBR data.
The present invention provides improved methods and arrangements for compressing IBR data, storing and transporting the compressed IBR data, and subsequently providing selective and JIT rendering of an image based on at least a portion of the compressed IBR data.
For example, in accordance with certain aspects, a multiple reference frame structure (MRF) compression/decompression technique is provided. For image arrays, this MRF technique significantly outperforms the intraframe compression schemes such as VQ or JPEG, yet still provides JIT real time rendering, which is not supported in a video-like coder. This MRF technique also outperforms JPEG compression at least two times. A two-level indexing mechanism is included within the resulting MRF compressed bitstream so that the image may be stored/transported and rendered just in time, with the content needed to render the current view decoded and accessed in real-time.
With this in mind, the above stated needs and others are met by a method for compressing an image data array having image data associated with a plurality of frames. The method includes selectively dividing the frames into anchor frames and predicted frames, independently encoding each of the anchor frames, and encoding a prediction residue for each of the predicted frames. Here, the prediction residue is determined by referring each of the predicted frames to at least two of the anchor frames.
The anchor frames can be staggered to form a pattern within the image data array. For example, a grid pattern having equal distances between neighboring anchor frames can be implemented. This allows for at least one predicted frame to be located between at least two neighboring anchor frames. In certain implementations there are at least three predicted frames located between every two neighboring anchor frames.
The anchor frames can be independently encoded by segmenting each of the anchor frames into a plurality of anchor frame macroblocks, and then encoding each of the anchor frame macroblocks. To encode each of the anchor frame macroblocks, the method may further include subdividing each anchor frame macroblock into a plurality of subblocks, and then transforming each subblock using a discrete cosine transform (DCT) and entropy encoding each transformed subblock using a run-length Huffman coder. For example, in certain implementations, each anchor frame macroblock is subdivided into at least four chrominance subblocks and at least two luminance subblocks.
Encoding the prediction residue for each of the predicted frames includes encoding each of the predicted frame macroblocks using motion compensation. This can be accomplished, for example, for each predicted frame macroblock, by searching in an area within the image data array near the predicted frame macroblock for a significantly best matching anchor frame macroblock, determining a reference vector for each predicted frame macroblock within each predicted frame, and determining a prediction residue for the predicted frame macroblock based on the difference between a predicted frame macroblock value and an anchor frame macroblock value. For each predicted frame macroblock, the method may further include transforming the residue by a discrete cosine transform (DCT), and entropy encoding each transformed residue using a run-length Huffman coder. The predicted frame macroblocks can be encoded using a translation-based motion model, an affine motion model, a perspective motion model, or other like motion models.
The method may further include outputting a bitstream having encoded anchor frame data, encoded predicted frame data, indexing data, and any requisite quantization scale information. The indexing data is configured to identify each encoded anchor frame and each encoded predicted frame. The encoded anchor frame data is further configured to identify encoded macroblocks within each encoded anchor frame, and the encoded predicted frame data is further configured to identify encoded predicted frame macroblocks within each encoded predicted frame
A method for decompressing a bitstream is also provided. Here, the bitstream includes encoded anchor frame data, encoded predicted frame data, and indexing data associated with a compressed image data array having image data associated with a plurality of frames. The method includes accessing the index data to identify a unique location for each encoded anchor frame within the encoded anchor frame data, and a unique location for each encoded predicted frame within the encoded predicted frame data. Each encoded anchor frame includes additional indexing information that identifies the location of each encoded anchor frame macroblock therein. Similarly, each encoded predicted frame includes additional information that identifies the location of each encoded predicted frame macroblock therein.
For each new view to be rendered, the method includes determining which encoded anchor frame macroblocks and encoded predicted frame macroblocks are to be used in rendering the new view, selectively decoding the encoded anchor frame macroblock to be used in rendering the new view and those to be referred by the predicted frame macroblock, and selectively decoding the predicted frame macroblock.
In certain implementations, the encoded anchor frame macroblocks are decoded by determining if the encoded anchor frame macroblock has an existing corresponding decoded anchor frame macroblock, and if so, using the existing corresponding decoded anchor frame macroblock in rendering the new view. Otherwise, the method includes decoding the encoded anchor frame macroblock to be used in rendering the new view. Similarly, to selectively decode the predicted frame macroblock the method further includes determining if the encoded predicted frame macroblock has an existing corresponding decoded predicted frame macroblock, and if so, using the existing corresponding decoded predicted frame macroblock in rendering the new view. Otherwise, the method includes decoding the predicted frame macroblock using all referenced decoded anchor frame macroblocks for the predicted frame macroblock. This may require that additional anchor frame macro blocks be decoded first.
The method can determine which encoded anchor frame macroblocks and encoded predicted frame macroblocks are to be used in rendering the new view by splitting the new view into a plurality of rays, wherein each ray passes through two parallel planes, and identifying an intersecting coordinate for each ray that locates which encoded anchor frame macroblocks and encoded predicted frame macroblocks are to be used in rendering the new view with respect to the compressed image data array. In some cases this requires a bilinearly interpolation process using a portion of the plurality of rays to calculate the intersecting coordinate.
Once an anchor frame or predicted frame macroblock has been decoded, it can be saved to memory and used again, as required. In certain implementations, logically separate cache memories are used and managed to allow for quick response and improved performance.