Field of the Invention
Embodiments of the present invention relate generally to graphics processing and, more specifically, to hierarchical tiled caching.
Description of the Related Art
Some graphics subsystems implement a tiling architecture in which a render target is divided into partitions referred to as “tiles.” Some tile-based systems also store data in an on-chip cache memory during rendering, which increases performance and reduces memory bandwidth consumption. In such systems, primitives are rearranged based on which tile the primitives overlap. For improved performance, multiple processing entities also may be implemented to process the tiles in parallel.
Generally speaking, in the above approach, the different tiles are processed in order, which results in data being loaded into the on-chip cache memory in a tile-by-tile order. Each cache tile is associated with a subset of screen-space and thus a subset of the memory addresses that store pixel data for the screen-space. Therefore, by processing data in the tile-by-tile order, a relatively small subset of data is resident in the cache memory at any one time. Further, most accesses to any particular subset of data occur while that data is resident in the cache memory, since data for each tile are processed together. Therefore, the number of cache misses that occur is reduced as compared with a system that does not process data in cache tile order.
One drawback to the tiling techniques described above, however, is that such techniques do not reduce the number of cache misses that occur when the set of rearranged primitives does not include primitives that overlap in screen-space. In some instances, data that is rearranged does not contain many or any primitives that overlap in screen space. One such instance is when an application program transmits instructions to draw primitives on an object-by-object basis, where objects are application-specified models (such as cars, people, and the like). In such instances, the data that is rearranged may include only a part of such a model, especially if the model includes a very large number of primitives, and the rearrangement process is limited in terms of the number of primitives that can be handled. However, when the number of primitives that overlap is low, the benefits associated with performing multiple operations on a single pixel without causing a cache miss do not exist. These benefits do not exist because only operations for a single pixel are performed when data for one tile is loaded into the cache.
As the foregoing illustrates, what is needed in the art is an improved technique for tiled caching.