1. Field of the Invention
Embodiments of the present invention relate generally to computer graphics and more specifically to optimizing a graphics rendering pipeline using early Z-mode.
2. Description of the Related Art
A graphics rendering engine commonly consists of a set of specialized processing engines organized in a dataflow-style pipeline. After any data fetch engines, the setup engine is commonly at the top of the graphics rendering engine. The setup engine operates on geometric primitives, such as triangles, and emits transformed or simplified representations of the geometric primitives to a raster engine. The raster engine determines pixel coverage associated with each geometric primitive, producing a sequential stream of unshaded pixel primitives. A shader engine operates on the sequential stream of unshaded pixels from the raster engine, producing a stream of shaded pixels. In addition to computing the color of a given pixel, some shader engines optionally operate on the depth (Z-value) and stencil attributes of a pixel. Pixel depth and stencil values are computed by a Z-raster operations (ZROP) engine when not computed by the shader engine. A Z-resolve engine determines if a new pixel should be saved or discarded through an operation called Z-testing. Z-testing compares a new pixel's depth and stencil data against previously stored depth and stencil data in the current depth buffer at the location of the new pixel. If Z-testing determines the new pixel is to be saved, the Z-resolve engine writes the new pixel's depth and stencil data to the current depth buffer. The Z-resolve engine then informs a color raster operations (CROP) engine to write the new pixel's color data to the current image buffer.
The precise sequence of processing steps in a graphics rendering pipeline is commonly designed to accommodate the simplest reduction of sequential data dependence in the rendering process. For example, a triangle primitive should be rasterized into a pixel primitive before pixel operations are conducted on the set of pixels covered by the triangle. Additionally, a pixel's Z-value should be computed before being compared to previously computed Z-values in the depth buffer. Z-testing is commonly conducted after shading, giving the shader engine an opportunity to conclude any depth or stencil computations prior to Z-testing.
As is well known, the shader engine is the most expensive element of the graphics rendering pipeline, consuming the most logic resources and the most power. Furthermore, complex shading algorithms commonly executed in the shader engine cause the shader engine to become the leading performance bottleneck in the graphics rendering pipeline. Early Z-culling in the raster engine achieves some performance gain by discarding primitives known to be occluded before work related to these primitives is triggered within the shader engine. However, early Z-culling is only a trivial discard mechanism and not a substitute for the more precise Z-testing. Even when early Z-culling is employed, the Z-testing step may discard up to half of the pixels processed by the shader engine. More importantly, the shader engine does not even modify the Z-values of many of the discarded pixels during shading operations, making the traversal of these pixels through the shader engine superfluous. Thus, a consequence of standard architectures is that the shader engine, the single most expensive resource in a graphics rendering pipeline, operates at a substantially inefficient level.
As the foregoing illustrates, what is needed in the art is a technique for improving shader engine efficiency in a graphics rendering pipeline.