1. Field of the Invention
Embodiments of the present invention relate generally to computer graphics and more specifically to a graphics rendering pipeline that supports early-Z and late-Z virtual machines.
2. Description of the Related Art
A graphics rendering engine commonly consists of a set of specialized processing engines organized in a dataflow-style pipeline. After any data fetch engines, the setup engine is commonly at the top of the graphics rendering engine. The setup engine operates on geometric primitives, such as triangles, and emits transformed or simplified representations of the geometric primitives to a raster engine. The raster engine determines pixel coverage associated with each geometric primitive, producing a sequential stream of unshaded pixel primitives with an associated depth value (Z-value). A shader engine operates on the sequential stream of unshaded pixels from the raster engine, producing a stream of shaded pixels. In addition to computing the color of a given pixel, some shader engines optionally generate or modify the Z-value of a pixel. A Z-raster operations (ZROP) engine determines if a new pixel should be saved or discarded through an operation called Z-testing. Z-testing compares a new pixel's depth and stencil data against previously stored depth and stencil data in the current depth buffer at the location of the new pixel. If a pixel survives Z-testing, the ZROP engine optionally writes the new pixel's depth and stencil data to the current depth buffer. A Z-resolve engine merges the results of Z-testing with latency buffered data associated with the associated pixel. The Z-resolve engine transmits pixels that have survived Z-test to a color raster operations (CROP) engine and discards pixels that have not survived Z-test. The color raster operations (CROP) engine updates and writes the new pixel's color data to the current color buffer.
The precise sequence of processing steps in a graphics rendering pipeline is commonly designed to accommodate the simplest reduction of sequential data dependence in the rendering process. For example, a triangle primitive should be rasterized into a pixel primitive before pixel operations are conducted on the set of pixels covered by the triangle. Additionally, a pixel's Z-value should be computed before being compared to previously computed Z-values in the depth buffer. Z-testing is commonly conducted after shading, since pixel or sample kills resulting from alpha testing, alpha-to-coverage operations, and shader-pixel-kill operations are specified to take place before the Z buffer is updated. Also, in some modes, the shader may compute Z-values.
As is well known, the shader engine is the most expensive element of the graphics rendering pipeline, consuming the most logic resources and the most power. Furthermore, complex shading algorithms commonly executed in the shader engine cause the shader engine to become the leading performance bottleneck in the graphics rendering pipeline. Early Z-culling in the raster engine achieves some performance gain by discarding primitives known to be occluded before work related to these primitives is triggered within the shader engine. However, early Z-culling is only a trivial discard mechanism and not a substitute for the more precise Z-testing. Even when early Z-culling is employed, the Z-testing step may discard half or more of the pixels processed by the shader engine. More importantly, the shader engine typically does not even modify the Z-values of many of the discarded pixels during shading operations, making the traversal of these pixels through the shader engine superfluous. Certain prior art systems provide a way to perform the Z test early, ahead of shading, if this can be done without altering the final image. In such systems, current state, optionally with a hysteresis mechanism, determines whether the pipe is configured to operate in early Z-mode (z testing performed ahead of the shader) or late Z-mode (z test performed after shader). When state changes such that a switch between early and late Z-modes is needed, either the shader or Z processing pipeline is flushed to prevent data hazards. The disadvantage of such prior art systems is that each flush associated with a Z-mode change can require several hundred clock cycles, making each flush a relatively costly operation in terms of efficiency and performance. Thus, applications that switch state rapidly either suffer performance degradation from needing to perform frequent flush operations, or simply operate in the less efficient late Z-mode to sidestep the issue altogether.
As the foregoing illustrates, what is needed in the art is a technique for improving efficiency in a graphics rendering pipeline when alternating between early Z-mode and late Z-mode operation.