1. Field of the Invention
Embodiments of the present invention relate generally to computer graphics and more specifically to color-compression using automatic reduction of multi-sampled pixels.
2. Description of the Related Art
A graphics rendering engine used to generate computer graphics images commonly includes a set of processing engines organized in a dataflow-style pipeline. Such images are conventionally composed of geometric primitives such as, for example, triangles.
To render a computer graphics image, each triangle is transformed into a screen-aligned coordinate system, referred to as “screen space.” Manipulation of the geometric primitives up to and including the transformation into screen space are typically performed in the graphics rendering engine by a geometry processing unit, which passes results to a rasterization unit. The rasterization unit decomposes each geometric primitive into fragments for further processing, where there is a fragment associated with each screen space pixel either fully of partially covered by the geometric primitive. The coverage of a particular fragment (referred to herein as the “fragment coverage”) indicates the portion of the screen space pixel corresponding to the fragment that is covered by the geometric primitive. Each fragment may also have associated data, including, without limitation, depth and color values. The depth value of a fragment is compared to a previous depth value to determine the visibility of that fragment. If the fragment is visible, the color value of the fragment either contributes to or uniquely determines the color of the corresponding pixel. When a fragment is found to be visible, its corresponding fragment data, including, without limitation, depth and color values, are written to a frame buffer memory.
Depth values and color values may each undergo read, write and read-modify-write operations with respect to the frame buffer memory. The graphics rendering engine and the frame buffer memory are commonly in different chips, requiring all frame buffer accesses to be conducted over a chip-to-chip interconnect. The data bandwidth between the graphics rendering engine and the external memory devices making up the frame buffer is called memory bandwidth, and is commonly one of the most significant factors limiting system performance.
As is well known, the quality of a rendered image is significantly improved with anti-aliasing. Super-sampling and multi-sampling are two common anti-aliasing techniques known in the art. Super-sampling involves generating multiple samples within a pixel, where each sample is independently computed for coverage and shading. The shaded samples are stored within a frame buffer and blended together for display. While super-sampling produces a very accurate and high quality image, super-sampling is quite expensive because each pixel within a rendered image requires the computational processing of multiple fully shaded samples, and shading is typically the most expensive operation within the graphics rendering engine.
Multi-sampling is a less expensive technique that uses one fully shaded color value and a coverage mask, rather than multiple fully shaded samples, to generate the multiple samples stored in the frame buffer that are ultimately blended together to produce a pixel within a rendered image. Multi-sampling is commonly used because of the substantial cost-versus-performance benefit that is typically achieved without a significant loss in overall image quality. Although multi-sampling saves shader processing relative to super-sampling, multi-sampling still requires a frame buffer with a sample per pixel and the attendant bandwidth, which can limit application performance. Some techniques exist for compressing multi-sampled color data by identifying situations in which all samples for a pixel have identical color values and can be represented by a single “reduced” color value per pixel. By storing reduced color values per pixel, rather than independent color values per sample, frame-buffer bandwidth can be substantially reduced. Reducing samples saves off-chip bandwidth. As screen resolutions and sample rates become higher, it becomes expensive and impractical to expand fragment colors into samples in the graphics rendering pipeline when many such fragments will ultimately be reduced, as described above.
As the foregoing illustrates, what is needed in the art is a technique that achieves the processing and bandwidth advantages of reduction throughout the entire graphics rendering pipeline.