1. Field of the Invention
Embodiments of the present invention relate generally to graphics processing and, more specifically, to stencil buffer data compression.
2. Description of the Related Art
Some conventional graphics processing units (GPUs) include different processing engines configured to operate in parallel with one another to implement a graphics processing pipeline. A graphics processing pipeline is the collection of processing steps performed to transform 3-D scene descriptions into rendered 2-D images. When a given processing engine finishes processing data, that processing engine may copy the processed data from local memory to a memory that is shared between the different processing engines within the GPU. Other processing engines may then access the processed data and then perform additional processing operations with that data. One type of data structure used in a graphics processing pipeline to allow different processing engines to access that data is a stencil buffer.
Stencil buffers include stencil values associated with each pixel or sample included in an image surface. Typically, each stencil value is an unsigned integer represented by 8 bits. The meaning and use of the stencil values varies by application. But, in general, stencil values are compared with reference values as part of stencil tests. The outcome of a particular stencil test is often coupled with a depth test and the result determines whether a sample is discarded. In operation, an advanced GPU typically performs many memory access operations and computation operations on the stencil values. Performing this quantity of operations negatively impacts the memory bandwidth, power consumption, and processing speed of the GPU. Because memory bandwidth, acceptable power consumption, and processing capability of GPUs are limited, any increase in memory bandwidth, power consumption, or number of processing operations is generally undesirable.
To reduce the number of operations performed on the stencil values, some advanced GPUs implement a one-bit delta stencil buffer compression algorithm. In particular, the stencil buffer compression algorithm enables the GPU to perform memory operations and computation operations on “compressible” groups of stencil values without individually accessing each value in the stencil group. Such a technique reduces the number of memory access operations and computation operations associated with compressible stencil groups. Each stencil group represents the stencil values of groups of proximally-located samples. In general, to determine whether to compress the stencil group, the GPU evaluates the stencil values included in the stencil group. If the stencil values in the stencil group vary by only one (i.e., stencil values 64 and 63, stencil values 98 and 99, etc.), then the GPU compresses the stencil group. However, if the stencil values of the samples in the group vary by more than one (i.e., stencil values of 64 and 62, stencil values of 98, 99, and 100, etc.), then the GPU does not compress the stencil group.
One limitation of the one-bit delta stencil buffer compression technique is that the number of compressible sample groups may be limited. For example, many graphics GPUs support rendering paths through a two-pass rendering process known as “stencil-then-cover.” First, in a path stenciling pass, the GPU generates a stencil buffer that indicates which samples (i.e., positions within each pixel) are covered by the path. Second, in a path covering pass, the GPU generates cover geometry for the path and shades the cover geometry with stencil testing enabled. As part of the path stenciling pass, many GPUs implement a winding algorithm in which the stencil value for a particular sample is based on the triangles included in the path that cover the sample. The triangles that have a counterclockwise winding are referred as “front-facing” triangles, whereas the triangles that have a clockwise winding are referred to as “back-facing” triangles. For each front-facing triangle included in a path, the GPU increments the stencil value corresponding to each covered sample. By contrast, for each back-facing triangle included in a path, the GPU decrements the stencil value corresponding to each covered sample.
Many complex paths include concave geometry with multiple front-facing and back-facing triangles, and the corresponding stencil values in a localized region will often vary by more than one. For example, suppose that two samples were to be represented in a particular stencil group. Further, suppose that one sample were to be covered by one front-facing triangle and three back-facing triangles and the other sample were to be covered by two front-facing triangles and two back-facing triangles. In such a case, the delta between the two corresponding stencil values would be 2. Since the one-bit delta stencil compression technique for stencil buffer compression does not support a delta of more than 1 across a stencil group, a GPU implementing such a technique would not compress the stencil group. Consequently, the GPU that implemented the one-bit delta stencil compression technique would not provide any memory bandwidth, power, or performance advantage for the group compared to a GPU that did not implement any stencil buffer compression.
As the foregoing illustrates, what is needed in the art is a more effective approach to stencil buffer compression.