1. Field of the Invention
One or more aspects of the invention generally relate to computer graphics, and more particularly to matching data streams when a partition based memory system is used.
2. Description of the Related Art
A conventional graphics rendering engine commonly consists of a set of specialized processing engines organized in a dataflow-style pipeline. A setup engine is commonly at the top of the graphics rendering engine. The setup engine operates on geometric primitives, such as triangles, and emits transformed or simplified representations of the geometric primitives to a raster engine. The raster engine determines pixel coverage associated with each geometric primitive, producing a sequential stream of unshaded pixel primitives with an associated depth value (z value). A shader engine operates on the sequential stream of unshaded pixels from the raster engine, producing a stream of shaded pixels. In addition to computing the color of a given pixel, some shader engines optionally generate or modify the z value of a pixel. A raster operations (ROP) unit determines if a new pixel should be saved or discarded through an operation called z testing. Z testing compares a new pixel's depth and stencil data against previously stored depth and stencil data in a render target, i.e., the current depth buffer, at the location of the new pixel. If a pixel survives z testing, the ROP unit optionally writes the new pixel's depth and stencil data to the current depth buffer. The ROP unit also updates and writes the new pixel's color data to another render target, i.e., the current color buffer. When multiple render targets (color buffers) are used, per-pixel z values may be computed and tested for each of the render targets, even when the z values are the same for the multiple render targets. Therefore, techniques are needed to improve z testing efficiency when multiple render targets are used that share a common z buffer.
Additionally, the precise sequence of processing steps in a graphics rendering pipeline is commonly designed to accommodate sequential data dependence in the rendering process. For example, a triangle primitive should be rasterized into a pixel primitive before pixel operations are conducted on the set of pixels covered by the triangle. Additionally, a pixel's z value should be computed before being compared to previously computed z values in the depth buffer. Z testing is commonly conducted after shading, giving the shader engine an opportunity to conclude any depth or stencil computations prior to z testing.
As is well known, the shader engine is the most expensive element of the graphics rendering pipeline, consuming the most logic resources and the most power. Furthermore, complex shading algorithms commonly executed in the shader engine often cause the shader engine to become the leading performance bottleneck in the graphics rendering pipeline. Early z culling in the raster engine achieves some performance gain by discarding primitives known to be occluded before work related to these primitives is triggered within the shader engine. However, early z culling is only a trivial discard mechanism and not a substitute for the more precise z testing. Even when early z culling is employed, the z testing step may discard half or more of the pixels processed by the shader engine. More importantly, the shader engine typically does not even modify the z values of many of the discarded pixels during shading operations, making the traversal of these pixels through the shader engine superfluous. Thus, a consequence of standard architectures is that the shader engine, the single most expensive resource in a graphics rendering pipeline, performs significant work that is then discarded.
As the foregoing illustrates, what is needed in the art is a technique for associating z testing results with multiple sets of color data, using hardware that can be deployed with or without early z culling.