Accessing external memory from a graphics processing unit is costly in terms of both power and performance. Thus, most modern graphics processing units employ a compression scheme to reduce memory bandwidth for improved power and performance.
Typical compression algorithms achieve from 2:1 to 8:1 compression ratios on a given cacheline. Alternatively graphics processing unit architectures can be tile-based to further reduce memory bandwidth of graphics workloads.
Typically three-dimensional graphics applications render one frame at a time using multiple render passes. Each pass updates a render target that could be either used as a texture sampling surface or a blend destination in subsequent passes. Moreover, these render targets may be color buffers or the Unordered Access View (UAV) buffers. Compressing the surfaces not only suppresses the write back bandwidth to memory but also read bandwidth when these surfaces are used as textures or blend destinations.
Current parallel graphics data processing includes systems and methods developed to perform specific operations on graphics data such as, for example, linear interpolation, tessellation, rasterization, texture mapping, depth testing, etc. Traditionally, graphics processors used fixed function computational units to process graphics data; however, more recently, portions of graphics processors have been made programmable, enabling such processors to support a wider variety of operations for processing vertex and fragment data.
To further increase performance, graphics processors typically implement processing techniques such as pipelining that attempt to process, in parallel, as much graphics data as possible throughout the different parts of the graphics pipeline. Parallel graphics processors with single instruction, multiple thread (SIMT) architectures are designed to maximize the amount of parallel processing in the graphics pipeline. In an SIMT architecture, groups of parallel threads attempt to execute program instructions synchronously together as often as possible to increase processing efficiency.