1. Field of the Invention
Embodiments of the present invention relate generally to graphics processing and, more specifically, to techniques for optimizing stencil buffers.
2. Description of the Related Art
Some conventional graphics processing units (GPUs) include different processing engines configured to operate in parallel with one another to implement a graphics processing pipeline. A graphics processing pipeline is the collection of processing steps performed to transform 3-D images into rendered 2-D images. When a given processing engine finishes processing data, that processing engine may copy the processed data from local memory to a memory that is shared between the different processing engines within the GPU. Other processing engines may then access the processed data and then perform additional processing operations with that data. One type of data structure used in a graphics processing pipeline to allow different processing engines to access that data is a stencil buffer.
Stencil buffers include stencil values associated with each pixel or sample included in an image surface. Typically, each stencil value is an unsigned integer represented by 8 bits. The meaning and use of the stencil values varies by application. But, in general, stencil values are compared with reference values as part of stencil tests. The outcome of a particular stencil test is often coupled with a depth test and the result determines whether a sample is discarded. This result may also be used to control the updating of the stencil value. In operation, the stencil buffer is often used to identify a set of samples in one render pass and then control the fate of the identified samples and the updating of the associated stencil values in subsequent render passes.
In addition to well-known operations, such as limiting the rendering area, the stencil buffer may be used in a variety of different algorithms. For some classes of algorithms the stencil values included in the stencil buffer are used as binary switches—the stencil value associated with each sample is either on or off. In subsequent rendering passes, the samples associated with stencil values that are on are processed in an application-specific manner. In addition, the stencil values that are off are typically discarded. For instance, even-odd path rendering may be implemented using a stencil-then-cover algorithm where stencil values are assigned either a ‘1’ or a ‘0.’ In even-odd path rendering, if an odd number of path edges lie between the sample and the outside of the shape, then the sample is considered to be inside the shape. Conversely, if an even number of path edges lie between the sample and the outside of the shape, then the sample is considered to be outside the shape. In a first rendering pass, the stencil values corresponding to all of the samples within the path are assigned a ‘1.’ In a second rendering pass, the samples associated with stencil values equal to ‘1’ are colored, and the samples associated with stencil values equal to ‘0’ are discarded.
One limitation to using stencil buffers in a binary fashion is that the stencil buffer requires a relatively large amount of sparsely-accessed memory. For example, suppose that each pixel included 16 samples and an algorithm were to use a stencil buffer in a binary fashion. In such a scenario, the stencil buffer corresponding to each surface would include 128 bits per pixel, but the algorithm would only utilize 16 bits per pixel. Consequently, 122 bits per pixel of memory would be wasted. Because the local memory capacity of GPUs is limited, such memory wastage is undesirable. The negative impact of wasted memory is particularly noticeable for systems that include relatively small local memories, such as those in low cost mobile environments.
As the foregoing illustrates, what is needed in the art is a more effective technique for implementing stencil buffers.