The present invention relates to computer graphics, and more particularly to rendering images of three-dimensional scenes using z-buffering.
Rendering is the process of making a perspective image of a scene from a stored geometric model. The rendered image is a two-dimensional array of pixels, suitable for display.
The model is a description of the objects to be rendered in the descriptions of polygons together with other information related to the properties of the polygons.
Part of the rendering process is the determination of occlusion, whereby the objects and portions of objects occluded from view by other objects in the scene are eliminated.
As the performance of polygon rendering systems advances, the range of practical applications grows, fueling demand for ever more powerful systems capable of rendering ever more complex scenes. There is a compelling need for low-cost high-performance systems capable of handling scenes with high depth complexity, i.e., densely occluded scenes (for example, a scene in which ten polygons overlap on the screen at each pixel, on average).
There is presently an obstacle to achieving high performance in processing densely occluded scenes. In typical computer graphics systems, the model is stored on a host computer which sends scene polygons to a hardware rasterizer which renders them into the rasterizer""s dedicated image memory. When rendering densely occluded scenes with such systems, the bandwidth of the rasterizer""s image memory is often a performance bottleneck.
Traffic between the rasterizer and its image memory increases in approximate proportion to the depth complexity of the scene. Consequently, frame rate decreases in approximate proportion to depth complexity, resulting in poor performance for densely occluded scenes.
A second potential bottleneck is the bandwidth of the bus connecting the host and the rasterizer, since the description of the scene may be very complex and needs to be sent on this bus to the rasterizer every frame. Although memory bus bandwidth has been increasing steadily, processor speed has been increasing faster than associated memory and bus speeds.
Consequently, bandwidth limitations can become relatively more acute over time. In the prior art, designers of hardware rasterizers have addressed the bottleneck between the rasterizer and bandwidth through interleaving and reducing bandwidth requirements by using smart memory.
Interleaving is commonly employed in high-performance graphics work stations. For example, the SGI Reality Engine achieves a pixel fill rate of roughly 80 megapixels per second using 80 banks of memory.
An alternative approach to solving the bandwidth problem is called the smart memory technique. One example of this technique is the Pixel-Planes architecture. The memory system in this architecture takes as input a polygon defined by its edge equations and writes all of the pixels inside the polygon, so the effective bandwidth is very high for large polygons.
Another smart-memory approach is xe2x80x9cFBRAM,xe2x80x9d a memory-chip architecture with on-chip support for z-buffering and compositing. With such a chip, the read-modify-write cycle needed for z-buffering can be replaced with only writes, and as a result, the effective drawing bandwidth is higher than standard memory. All of these methods improve performance, but they involve additional expense, and they have other limitations. Considering cost first, these methods are relatively expensive which precludes their use in low-end PC and consumer systems that are very price sensitive.
A typical low-cost three-dimensional rasterization system consists of a single rasterizer chip connected to a dedicated frame-buffer memory system, which in turn consists of a single bank of memory. Such a system cannot be highly interleaved because a full-screen image requires only a few memory chips (one 16 megabyte memory chip can store a 1024 by 1024 by 16 bit image), and including additional memory chips is too expensive.
Providing smart memory, such as FBRAM, is an option, but the chips usually used here are produced in much lower volumes than standard memory chips and are often considerably more expensive. Even when the cost of this option is justified, its performance can be inadequate when processing very densely occluded scenes.
Moreover, neither interleaving nor smart memory addresses the root cause of inefficiency in processing densely occluded scenes, which is that most work is expended processing occluded geometry. Conventional rasterization needs to traverse every pixel on every polygon, even if a polygon is entirely occluded.
Hence, there is a need to incorporate occlusion culling into hardware renderers, by which is meant culling of occluded geometry before rasterization, so that memory traffic during rasterization is devoted to processing only visible and nearly visible polygons. Interleaving, smart memory, and occlusion culling all improve performance in processing densely occluded scenes, and they can be used together or separately.
While occlusion culling is new to hardware for z-buffering, it has been employed by software rendering algorithms. One important class of such techniques consists of hierarchical culling methods that operate in both object space and image space. Hierarchical object-space culling methods include the xe2x80x9chierarchical visibilityxe2x80x9d algorithm which organizes scene polygons in an octree and traverses octree cubes in near-to-far occlusion order, culling cubes if their front faces are occluded. A similar strategy for object-space culling that works for architectural scenes is to organize a scene as rooms with xe2x80x9cportalsxe2x80x9d (openings such as doors and windows), which permits any room not containing the viewpoint to be culled if its portals are occluded.
Both the hierarchical visibility method and the xe2x80x9crooms and portalsxe2x80x9d method require determining whether a polygon is visible without actually rendering it, an operation that will be referred to as a visibility query or v-query. For example, whether an octree cube is visible can be established by performing v-query on its front faces.
The efficiency of these object-space culling methods depends on the speed of v-query, so there is a need to provide fast hardware support.
Hierarchical image-space culling methods include hierarchical z-buffering and hierarchical polygon tiling with coverage masks, both of which are loosely based on Wamock""s recursive subdivision algorithm.
With hierarchical z-buffering, z-buffer depth samples are maintained in a z-pyramid having Nxc3x97N decimation from level to level (see N. Greene, M. Kass, and G. Miller, xe2x80x9cHierarchical Z-Buffer Visibility,xe2x80x9d Proceedings of SIGGRAPH ""93, July 1993). The finest level of the z-pyramid is an ordinary z-buffer. At the other levels of the pyramid, each z-value is the farthest z in the corresponding Nxc3x97N region at the adjacent finer level. To maintain the z-pyramid, whenever a z-value in the finest level is changed, that value is propagated through the coarser levels of the pyramid.
Since each entry in the pyramid represents the farthest visible z within a square region of the screen, a polygon is occluded within a pyramid cell if its nearest point within the cell is behind the corresponding z-pyramid value. Thus, often a polygon can be shown to be occluded by mapping it to the smallest enclosing z-pyramid cell and making a single depth comparison.
When this test fails to cull a polygon, visibility can be established definitively by subdividing the enclosing image cell into an Nxc3x97N grid of subcells and by comparing polygon depth to z-pyramid depth within the subcells.
Recursive subdivision continues in subcells where the polygon is potentially visible, ultimately finding the visible image samples on a polygon or proving that the polygon is occluded. Since this culling procedure only traverses image cells where a polygon is potentially visible, it can greatly reduce computation and z-buffer memory traffic, compared to conventional rasterization, which needs to traverse every image sample on a polygon, even if the polygon is entirely occluded.
Hierarchical z-buffering accelerates v-query as well as culling of occluded polygons.
Another algorithm that performs image-space culling with hierarchical depth comparisons is described by Latham in U.S. Pat. No. 5,509,110, xe2x80x9cMethod for tree-structured hierarchical occlusion in image generators,xe2x80x9d April, 1996. Although Latham""s algorithm does not employ a full-screen z-pyramid, it does maintain a depth hierarchy within rectangular regions of the screen which is maintained by propagation of depth values.
As an alternative to hierarchical z-buffering with a complete z-pyramid, a graphics accelerator could use a two-level depth hierarchy. Systems used for flight-simulation graphics can maintain a xe2x80x9czfarxe2x80x9d value for each region of the screen.
The screen regions are called spans and are typically 2xc3x978 pixels. Having spans enables xe2x80x9cskip overxe2x80x9d of regions where a primitive is occluded over an entire span.
Another rendering algorithm which performs hierarchical culling in image space is hierarchical polygon tiling with coverage masks. If scene polygons are traversed in near-to-far occlusion order, resolving visibility only requires storing a coverage bit at each raster sample rather than a depth value, and with hierarchical polygon tiling, this coverage information is maintained hierarchically in a coverage pyramid having Nxc3x97N decimation from level to level.
Tiling is performed by recursive subdivision of image space, and since polygons are processed in near-to-far occlusion order, the basic tiling and visibility operations performed during subdivision can be performed efficiently with Nxc3x97N coverage masks. This hierarchical tiling method can be modified to perform hierarchical z-buffering by maintaining a z-pyramid rather than a coverage pyramid and performing depth comparisons during the recursive subdivision procedure.
This modified version of hierarchical tiling with coverage masks is believed to be the fastest algorithm available for hierarchical z-buffering of polygons. However, for today""s processors, such software implementations of this algorithm are not fast enough to render complex scenes in real time.
A precursor to hierarchical polygon tiling with coverage masks is Meagher""s method for rendering octrees, which renders the faces of octree cubes in near-to-far occlusion order using a similar hierarchical procedure.
The ZZ-buffer algorithm is another hierarchical rendering algorithm.
Although it does not perform z-buffering, it does maintain an image-space hierarchy of depth values to enable hierarchical occlusion culling during recursive subdivision of image space.
Yet another approach to culling has been suggested, one that renders a z-buffer image in two passes and only needs to shade primitives that are visible. In the first pass, all primitives are z-buffered without shading to determine which primitives are visible, and in the second pass, visible primitives are z-buffered with shading to producing a standard shaded image.
Although this suggested approach reduces the amount of work that must be done on shading, it is not an effective culling algorithm for densely occluded scenes because every pixel inside every primitive must be traversed at least once. In fact, this approach does not fall within an acceptable definition for occlusion culling, since it relies on pixel-by-pixel rasterization to establish visibility.
The object-space and image-space culling methods, described above, can alleviate bandwidth bottlenecks when rendering densely occluded scenes. Suppose that a host computer sends polygon records to a graphics accelerator which renders them with hierarchical z-buffering using its own z-pyramid.
Suppose, further, that the accelerator can perform v-query and report the visibility status of polygons to the host. With hierarchical z-buffering, occluded polygons can be culled with a minimum of computation and memory traffic with the z-pyramid, and since most polygons in densely occluded scenes are occluded, the reduction in memory traffic between the accelerator and its image memory can be substantial.
Hierarchical z-buffering also performs v-query tests on portals and bounding boxes with minimal computation and memory traffic, thereby supporting efficient object-space culling of occluded parts of the scene. While hierarchical z-buffering can improve performance, today""s processors are not fast enough to enable software implementations of the traditional algorithm to render complex scenes in real time.
Thus there is a need for an efficient hardware architecture for hierarchical z-buffering.
A system, method and computer program product are provided for avoiding reading z-values in a graphics pipeline. Initially, near z-values are stored which are each representative of a near z-value on an object in a region. Such region is defined by a tile and a coverage mask therein. Thereafter, the stored near z-values are compared with far z-values computed for other objects in the region. Such comparison indicates whether an object is visible in the region. Based on the comparison, z-values previously stored for image samples in the region are conditionally read from memory.
In one aspect of the present embodiment, near z-values may be stored in a record associated with the tile. As an option, each near z-value may represent a nearest z-value on the object in the region. Moreover, each far z-value may represent a farthest z-value on the other objects in the region.
In another aspect of the present embodiment, the previously stored z-values are read from memory only if the far z-values computed for the other objects in the region are farther than or equal to the corresponding near z-values.
In still another aspect of the present embodiment, a pair of the near z-values may be stored for the tile. In particular, a first near z-value may be associated with a first sub-region covered by the coverage mask and a second near z-value may be associated with a second sub-region not covered by the coverage mask.
Another system, method and computer program product are provided for conservative stencil culling in a graphics pipeline. Initially, stencil values are maintained for regions at a plurality of levels of an image pyramid including a finest level and one or more coarser levels. In use, it is determined whether the stencil value for a region at one of the coarser levels is valid. If the stencil value at the coarser level is valid, conservative stencil culling is performed on the region utilizing the stencil value at the coarser level.
In one aspect of the present embodiment, the stencil value for the region at the coarser level may be valid if the stencil values of each of a plurality of portions of a corresponding region at a finer level are the same as each other. As an option, a valid flag may be used to indicate whether the stencil value at the coarser level is valid.
The stencil value for the region at the coarser level may be determined by reading stencil values from a corresponding region at the finest level. As an option, the valid stencil values for the regions at the coarser levels may be passed from the culling stage to the rendering stage.
Associated with the present embodiment is a method for creating a data structure adapted for use during conservative stencil culling. Such method includes maintaining stencil values for regions at a plurality of levels of an image pyramid. Also included is determining whether all of the stencil values of a region at a finer one of the levels of the image pyramid are the same as each other. A valid indicator is stored which indicates whether all of the stencil values of the region at the finer level are the same as each other. If all of the stencil values at the finer level are the same as each other, the stencil value is also stored.
As such, the data structure includes a valid indicator object indicating whether all stencil values of a region are equal to each other. Associated therewith is a stencil value object including a stencil value equal to the stencil values of the region if all of the stencil values of the region are equal to each other.
Still another system, method and computer program product are provided for multiple-pass rendering using conservative occlusion culling. During a first pass, objects are passed from an input stream to a geometric processor for being transformed. Also during the first pass, the objects are sent to a culling stage for creating an occlusion image in a first depth buffer requiring a first amount of storage. During a second pass, the objects are sent to the culling stage for conservatively culling occluded objects utilizing the occlusion image. The remaining objects are passed to a renderer. Such renderer requires a second depth buffer with a second amount of storage greater than the first amount of storage.
In one aspect of the present embodiment, shading operations may be performed only during the second pass. Further, the first amount of storage may be less than or equal to xc2xd, xc2xc, or xe2x85x9 the second amount of storage.
Still yet another system, method and computer program product are provided for multiple-pass rendering in a plurality of pipelines using conservative occlusion culling. During a first pass in a first pipeline, objects are passed from an input stream to a geometric processor of the first pipeline for being transformed. Also in the first pipeline, the objects may be sent to a culling stage for creating an occlusion image in a first depth buffer requiring a first amount of storage. During a second pass in a second pipeline, the objects are sent to a culling stage of the second pipeline for conservatively culling occluded objects utilizing the occlusion image. The remaining objects are passed to a renderer. Such renderer requires a second depth buffer with a second amount of storage greater than the first amount of storage.
In one aspect of the present embodiment, the first and second pipelines operate in parallel. Moreover, the first and second pipelines may operate on separate frames simultaneously. Further, the second pipeline may include a geometric processor for transforming objects.
In yet another system, method and computer program product, z-value reads are avoided in a multi-pass rendering algorithm in a graphics pipeline. During a first pass, objects are transformed. Thereafter, an occlusion image is created in a first depth buffer requiring a first amount of storage. Next, near z-values are stored in the occlusion image. Each near z-value is representative of a near z-value on one of the objects. During a second pass, objects are conservatively culled utilizing the occlusion image. The remaining objects are then rendered with a renderer. Such renderer requires a second depth buffer with a second amount of storage greater than the first amount of storage. Based on a depth comparison involving the near z-values, z-values previously stored for image samples are conditionally read from the second depth buffer.
In one aspect of the present embodiment, the stored near z-values may be compared with far z-values computed for other objects, and the previously stored z-values may be conditionally read based on the comparison. As an option, previously stored z-values may be read from memory only if the far z-values computed for the other objects are farther than or equal to the corresponding near z-values. Results of the comparison may be stored in a mask. Moreover, the mask may be used in a decision to conditionally read the previously stored z-values from the memory.
Another system, method and computer program product are provided for avoiding processing in a multi-pass rendering algorithm in a graphics pipeline. During a first pass, an object is transformed. It is then determined whether at least a portion of the object has been culled. Visibility information is then stored for indicating whether the at least portion of the object has been culled. During a second pass, processing is conditionally skipped in at least a portion of the graphics pipeline based on the visibility information.
In one aspect of the present embodiment, the visibility information may include a visibility bit. Further, it may be determined whether the at least portion of the object has been culled using z-value culling. Also, it may be determined whether the at least portion of the object has been culled using stencil-value culling.
In another aspect of the present embodiment, the processing of the at least portion of the object may be skipped during the second pass if the visibility information indicates that the at least portion of the object has been culled. Moreover, it may be determined whether the entire object has been culled, and the visibility information may indicate whether the entire object has been culled.
During the first pass, the object may be passed to a rasterizer for determining which of a plurality of tiles the object overlaps. It may be determined whether the object is culled in the at least one tile overlapped by the object, and the visibility information may indicate whether the object has been culled. The processing of the at least one tile may be skipped if the visibility information indicates that the at least one tile has been culled. Also, the processing that is skipped may include reading of an occlusion image associated with the at least one tile.
Still another system, method and computer program product are provided for avoiding processing in a multi-pass rendering algorithm in a graphics pipeline. During a first pass, objects are transformed. The objects are then tested for visibility. It is subsequently determined whether a last object processed within an entity of a screen is entirely visible. Further, status information is stored that indicates whether the last object processed within the entity of the screen is entirely visible. During a second pass, when rendering each object within the entity of the screen, z-values writes to and reads from a z-buffer are avoided if the status information indicates that the last object processed within the entity of the screen was entirely visible.
In one aspect of the present embodiment, the entity may include an image sample, a region of a tile, or a tile.
In each of the previous embodiments, the techniques may be performed utilizing a graphics pipeline including a culling stage having an input for receiving a plurality of the objects. Such culling stage may test the objects against a first depth buffer for occlusion and non-definitively but conservatively culling objects from the plurality of objects which it proves to be occluded. The graphics pipeline may further include a renderer downstream of the culling stage which, while the culling stage conservatively culls objects for a given frame, renders objects into the given frame which were tested for occlusion in the culling stage but which were not proven upstream of the renderer to be occluded.