1. Field of the Invention
The invention, generally, relates to computer graphics and, more particularly, to a new and improved method and apparatus for rendering images of three-dimensional scenes using z-buffering.
2. References
The following documents are all incorporated herein by reference.
J. Airey, xe2x80x9cIncreasing Update Rates in the Building Walkthrough System with Automatic Model-Space Subdivision and Potentially Visible Set Calculations,xe2x80x9d PhD Thesis, Technical Report TR90-027, Computer Science Dept., UNC Chapel Hill, 1990.
L. Carpenter, xe2x80x9cThe A-Buffer, an Antialiased Hidden Surface Method,xe2x80x9d Proc. of SIGGRAPH ""84, July 1984, 103-108.
J. H. Clark, xe2x80x9cHierarchical Geometric Models for Visible Surface Algorithms,xe2x80x9d Communications of the ACM 19(10), October 1976, 547-554.
S. Coorg and S. Teller, xe2x80x9cTemporally Coherent Conservative Visibility,xe2x80x9d Proc. Of 12th ACM Symposium on Computational Geometry, 1996.
M. Deering, S. Schlapp, and M. Lavelle, xe2x80x9cFBRAM: A new Form of Memory Optimized for 3D Graphics,xe2x80x9d Proc. of SIGGRAPH ""94, July 1994, 167-174.
Jay Duluk, personal communication, 1999.
H. Fuchs, J. Goldfeather, J. Hulquist, S. Spach, J. Austin, F. Brooks, Jr., J. Eyles, and J. Poulton, xe2x80x9cFast Spheres, Shadows, Textures, Transparencies, and Image Enhancements in Pixel-Planes,xe2x80x9d Proc. of SIGGRAPH ""85, July 1985, 111-120.
T. Funkhouserand C. Sequin, xe2x80x9cAdaptive Display Algorithm for Interactive Frame Rates During Visualization of Complex Virtual Environments,xe2x80x9d Proc. of SIGGRAPH ""93, August 1993, 247-254.
B. Garlick, D. Baum, and J. Winget, xe2x80x9cInteractive Viewing of Large Geometric Databases Using Multiprocessor Graphics Workstations,xe2x80x9d Siggraph ""90 Course Notes: Parallel Algorithms and Architectures for 3D Image Generation, 1990.
N. Greene, M. Kass, and G. Miller, xe2x80x9cHierarchical Z-Buffer Visibility,xe2x80x9d Proc. of SIGGRAPH ""93, July 1993, 231-238.
N. Greene, xe2x80x9cHierarchical Rendering of Complex Environments,xe2x80x9d PhD Thesis, Univ. of California at Santa Cruz, Report UCSC CRL-95-27, June 1995.
N. Greene, xe2x80x9cHierarchical Polygon Tiling with Coverage Masks,xe2x80x9d Proc. of SIGGRAPH ""96, August 1996.
H. Hoppe, xe2x80x9cProgressive Meshes,xe2x80x9d Proc. of SGGRAPH ""96, August 1996, 99-108.
T. Hudson, D. Manocha, J. Cohen, M. Lin, K. Hoff, and H. Zhang, xe2x80x9cAccelerated Occlusion Culling Using Shadow Frusta,xe2x80x9d Proc. Of ACM Symposium on Computational Geometry, 1997.
Adam Levinthal, personal communication, 1999.
D. Luebke and C. Georges, xe2x80x9cPortals and Mirrors: Simple, Fast Evaluation of Potentially Visible Sets,xe2x80x9d ACM Interactive 3D Graphics Conference, 1995.
D. Meagher, xe2x80x9cThe Octree Encoding Method for Efficient Solid Modeling,xe2x80x9d PhD Thesis, Electrical Engineering Dept., Rensselaer Polytechnic Institute, Troy, N.Y., August 1982.
B. Naylor, xe2x80x9cPartitioning Tree Image Representation and Generation from 3D Geometric Models,xe2x80x9d Proc. of Graphics Interface, 1992.
N. Scott, D. Olsen, and E. Gannett, xe2x80x9cAn Overview of the VISUALIZE fx Graphics Accelerator Hardware,xe2x80x9d The Hewlett-Packard Journal, 49(2), May 1998, 28-34.
O. Sudarsky and C. Gotsman, xe2x80x9cDynamic Scene Occlusion Culling,xe2x80x9d IEEE Transactions on Visualization and Computer Graphics, 5(1), January 1999.
Gary Tarolli, personal communication, 1999.
S. Teller, xe2x80x9cVisibility Computations in Densely Occluded Polyhedral Environments,xe2x80x9d PhD Thesis, Univ. of California at Berkeley, Report UCB/CSD 92/708, October 1992.
xe2x80x9cDenali Technical Overview,xe2x80x9d Kubota Pacific Computer, January 1993.
J. Warnock, xe2x80x9cA Hidden Surface Algorithm for Computer Generated Halftone Pictures,xe2x80x9d PhD Thesis, TR 4-15, Computer Science Dept., Univ. of Utah, June 1969.
F. Xie and M. Shantz, xe2x80x9cAdaptive Hierarchical Visibility in a Tiled Architecture,xe2x80x9d Proc. Eurographics/Siggraph Workshop on Graphics Hardware, August 1999, 75-84.
H. Zhang, D. Manocha, T. Hudson, and K. Hoff, xe2x80x9cVisibility Culling Using Hierarchical Occlusion Maps,xe2x80x9d Proc. of SIGGRAPH ""97, August 1997, 77-88.
H. Zhang, xe2x80x9cEffective Occlusion Culling for the Interactive Display of Arbitrary Models,xe2x80x9d PhD Thesis, Computer Science Dept., UNC Chapel Hill, 1998.
3. Description of Related Art
Rendering is the process of making a perspective image of a scene from a stored geometric model. The rendered image is a two-dimensional array of pixels, suitable for display.
The model is a description of the objects to be rendered in the scene stored as graphics primitives, most typically as mathematical descriptions of polygons in three-dimensional space, together with other information related to the properties of the polygons. Part of the rendering process is the determination of occlusion, whereby the objects and portions of objects which are occluded from view by other objects in the scene are eliminated.
As the performance of polygon-rendering systems advances, the range of practical applications grows, fueling demand for ever more powerful systems capable of rendering ever more complex scenes. There is a compelling need for low-cost high-performance systems capable of handling scenes with high depth complexity, i.e., xe2x80x9cdensely occludedxe2x80x9d scenes (for example, a scene in which ten polygons overlap on the screen at each pixel, on average).
In a typical z-buffer system for rendering polygons, each polygon in the scene is rasterized using a z-buffer to determine visibility at image samples. In many systems a host processor takes advantage of hardware assistance by sending each polygon in the scene on a bus to graphics hardware that rasterizes the polygon and maintains the z-buffer. Other z-buffer systems employ hierarchical z-buffering, which uses a xe2x80x9cz-pyramidxe2x80x9d instead of a conventional single-level z-buffer, as described in N. Greene, M. Kass, and G. Miller, xe2x80x9cHierarchical Z-Buffer Visibility,xe2x80x9d Proceedings of SIGGRAPH ""93, July 1993, pages 231-238, incorporated by reference herein. Hierarchical z-buffering can be very expensive to implement in its full form in hardware, so implementations of this algorithm in the past have maintained the z-pyramid and performed z-buffer visibility checking entirely in software.
As an alternative to hardware implementation of a full z-pyramid, some systems use only a two-level z-pyramid which includes just the two finest-resolution levels of a full z-pyramid. For example, some flight simulators use a two-level z-pyramid in which the coarser level contains xe2x80x9czfarxe2x80x9d values for rectangular regions of the screen. The rectangular screen regions are called xe2x80x9cspans.xe2x80x9d Having spans enables xe2x80x9cskip overxe2x80x9d of regions where a primitive is occluded over an entire span.
Another alternative to conventional hierarchical z-buffering is to separate culling from rendering in a hardware graphics pipeline by employing a culling stage that culls occluded geometry and passes visible geometry on to be rendered by a conventional z-buffer rendering stage. See N. Greene, Occlusion Culling with Optimized Hierarchical Z-Buffering, Siggraph Technical Sketch, Siggraph ""99 Conference Abstracts and Applications, August 1999; and N. Greene, Optimized Hierarchical Occlusion Culling for Z-Buffer Systems, Siggraph ""99 Conference Abstracts and Applications CD-ROM, August 1999, both incorporated by reference herein. This method is also described in the above-incorporated CIP parent patent application.
There is presently an obstacle to achieving high performance in processing densely occluded scenes. Typically, all xe2x80x9con-screenxe2x80x9d polygons in the scene are processed one-by-one by the host and sent on a bus to graphics hardware, which also processes polygons one by one. This is particularly inefficient for densely occluded scenes, because most polygons are occluded, and even the occluded polygons need to be sent on the bus, transformed to image space, and processed in other ways.
In the prior art, this problem has been addressed by organizing the model in three-dimensional bounding boxes and having the host processor cull occluded bounding boxes. With this approach, which will be called xe2x80x9cbox culling,xe2x80x9d only the polygons in visible bounding boxes need to be sent through the hardware rendering pipeline, thereby reducing bus traffic, transformation, rasterization, and other computations on polygons.
Another prior-art method, which is sometimes used to cull occluded regions of architectural models, is to organize the scene as xe2x80x9crooms with portals.xe2x80x9d See Seth Teller, xe2x80x9cVisibility Computations in Densely Occluded Polyhedral Environments,xe2x80x9d PhD Thesis, Univ. of California at Berkeley, Report UCB/CSD 92/708, October 1992; John Airey, xe2x80x9cIncreasing Update Rates in the Building Walkthrough System with Automatic Model-Space Subdivision and Potentially Visible Set Calculations,xe2x80x9d PhD Thesis, Technical Report TR90-027, Computer Science Dept., UNC Chapel Hill, 1990; and Tom Funkhouser and Carlo Sequin, xe2x80x9cAdaptive Display Algorithm for Interactive Frame Rates During Visualization of Complex Virtual Environrents,xe2x80x9d Proc. of SIGGRAPH ""93, August 1993, 247-254, all incorporated herein by reference. The term xe2x80x9cportalxe2x80x9d applies to apertures such as doors and windows. Room-to-room visibility relationships are computed and then stored for future reference. When rendering a frame, the xe2x80x9croomxe2x80x9d containing the viewpoint is determined, and the primitives composing this room are rendered. For all other rooms, the room is rendered only if one of its portals is visible.
While box culling can improve z-buffer rendering efficiency of densely occluded scenes, such methods have been mostly limited to all-software implementations. This is because where hardware assisted z-buffer rendering is available, the z-buffer is maintained entirely in the hardware. The z-buffer typically is quickly accessible to the host processor for visibility testing of bounding boxes or room portals. In some systems the host can read graphics memory over a bus for the purpose of occlusion testing, but this is a slow and laborious process that rarely improves rendering efficiency.
Some systems read the entire z-buffer from graphics memory into host memory in order to facilitate box culling or rooms-with-portals culling in software by the host processor. For example, N. Greene, M. Kass, and G. Miller, xe2x80x9cHierarchical Z-Buffer Visibility,xe2x80x9d Proc. of SIGGRAPH ""93, July 1993, 231-238, incorporated by reference herein, describes a temporal coherence algorithm in which, after each frame is rendered by conventional z-buffer hardware, the host processor reads the entire full-resolution z-buffer from graphics memory into host memory. The host processor then, in software, forms a z-pyramid from the depth information just read, and uses it to perform preliminary software-only box culling for the next frame. While such a system can improve rendering efficiency for densely occluded scenes with strong temporal coherence, it does so at the expense of enormous bandwidth requirements for copying the z-buffer from graphics memory to host memory. In a typical graphics system having an image resolution of 1000xc3x971000 pixels, at 24 bits of depth information per pixel, and in which a frame rate of 50 Hz is desired, this method requires 150 MByte/sec of bandwidth just for copying depth valuesxe2x80x94and this does not even allow for processing time on the host. Copying the z-buffer significantly more often than once per frame would be too slow for contemporary real time systems.
To provide hardware assistance for box culling and rooms-and-portals culling, some graphics accelerators support xe2x80x9cvisibility queriesxe2x80x9d on bounding boxes and portals. These systems maintain a conventional one-level z-buffer in the graphics hardware. To determine whether a bounding box is visible, the host sends a description of the box to the graphics hardware, which tests its front faces against the z-buffer to determine whether the box is visible at one or more image samples. The visibility status of the box is then reported back to the host processor. The operations just described are referred to herein as a xe2x80x9cvisibility queryxe2x80x9d on a bounding box. Visibility queries on portals are analogous.
The use of visibility queries to cull occluded bounding boxes and rooms is supported by graphics accelerators that were once made by Kubota Pacific (see xe2x80x9cDenali Technical Overview,xe2x80x9d Kubota Pacific Computer, January 1993, incorporated by reference herein) and graphics accelerators that are currently made by Hewlett-Packard. (See N. Scott, D. Olsen, and E. Gannett, xe2x80x9cAn Overview of the VISUALIZE fx Graphics Accelerator Hardware,xe2x80x9d The Hewlett-Packard Journal, 49(2), May 1998, pages 28-34, incorporated by reference herein).
One major problem with prior-art systems that employ hardware-assisted visibility queries is that there is a substantial delay between when the host processor issues a visibility query and when it receives a reply, and in the meantime, it is not known if the primitives within the corresponding box or room need to be processed. This uncertainty makes efficient scene management problematic. If the processor waits for the query reply before processing the associated primitives, time is wasted. If instead the processor goes ahead and sends the primitives through the pipeline, the processing devoted to these primitives will be wasted if the box turns out to be occluded.
Significant communication delays are fundamental to this method of performing hardware-assisted visibility queries because pipeline queues often contain numerous primitives. If boxes wait their turn in queues, delays are long, and if queues are skipped over, culling efficiency is impaired because the z-buffer used to test visibility is not up to date. Consequently, in many cases, culling with hardware-assisted visibility queries is not an effective way to accelerate rendering of densely occluded scenes.
A second problem with hardware-assisted visibility queries in z-buffer systems is that when a scene""s bounding boxes overlap deeply on the screen, testing boxes for visibility with an ordinary z-buffer requires a great deal of rasterization. It is much more efficient to test boxes for visibility with hierarchical z-buffering.
Hence, there is a need for a graphics architecture which enables hardware-assisted culling of occluded bounding boxes and portals without incurring substantial communication delays.
The invention is based in part on the discovery that for many densely occluded scenes, the vast majority of occluded geometries can be culled successfully using only a tiny percentage of the depth information in a full z-pyramid.
Roughly described, the invention involves az-buffer system having a host processor and graphics hardware that maintains z values in a pyramid having at least two levels. To enable the host processor to cull occluded parts of the scene, the system copies only the tip of the z-pyramid into host memory. The host processor is then able to perform bounding box (or other) culling itself, very efficiently, before sending polygons to the graphics hardware for rendering. As in prior art systems that copy an entire z-buffer to host memory, by culling occluded geometries on the host, the system is able to greatly reduce memory and bus traffic requirements in sending geometric information to the hardware for processing. But because the host uses only a very small amount of depth information, representing only the tip of a z-pyramid, to perform its culling, bandwidth requirements for copying such information from graphics memory into host memory are drastically reduced. In addition, the communication delays inherent in a system that performs hardware-assisted visibility queries is also avoided.
In a system whose graphics hardware creates and maintains a full z-pyramid, as much as 90% or more of occluded geometries in a densely occluded scene often can be culled using a z-pyramid tip containing less than 1% of the data in the z-pyramid, copied to the host on the order of 20-30 times per rendered frame. Most graphics accelerators today, however, do not maintain a full z-pyramid. Such systems maintain only a conventional single-level z-buffer, or at most a pyramid containing only two levels of resolution. It will be appreciated, however, that any reduction of the amount of depth information written into host memory is a significant improvement over the copying of an entire z-buffer. Thus where one or more coarser resolution levels of depth information is available, some benefits can be obtained by copying only the coarser resolution level or levels to host memory for pre-culling of geometries. Where the hardware does not maintain depth information at a resolution level that is as coarse as desired for writing to the host, the graphics hardware can create the information at that level xe2x80x9con-the-flyxe2x80x9d as needed for writing to the host. If desired, the host processor can then form even coarser levels of the z-pyramid by itself in software.