1. Field of the Invention
Embodiments of the present invention relate generally to caching and, more specifically, to efficient cache management in a tiled architecture.
2. Description of the Related Art
A conventional cache unit typically stores data that is to be processed by a hardware element or collection of hardware elements. Since the cache unit resides physically close to the hardware element(s), memory bandwidth may be improved since memory requests for data already stored in the cache need not be transmitted to global memory. This basic caching technique is ubiquitous throughout most modern computer systems.
In a computer system configured for graphics processing, cache units oftentimes store graphics data to be processed by different types of graphics processing pipelines. For example, a pixel cache unit that is coupled to a pixel processing pipeline could store pixel data to be processed for display on a display screen. In such a case, the pixel cache may improve memory bandwidth because the pixel processing pipeline may need to access the same portion of pixel data multiple different times when rendering an image for display.
In a tiled architecture, the pixel processing pipeline may process pixel data associated with neighboring screen tiles that could potentially overlap with one another, and the pixel cache could store pixel data associated with the overlapping region. With this approach, pixel data associated with an overlapping region shared between first and second screen tiles would only need to be accessed from global memory once (e.g., when the first screen tile is processed), and would be cached for later use when needed again (e.g., when the second screen tile is processed). This approach provides reasonable benefits in simple situations such as that described herein.
However, other situations may arise where the above approach causes significant thrashing of the pixel cache. Returning to the previous example, the pixel data associated with the overlapping region between the first and second screen tiles may also be needed by a tenth screen tile that overlaps the first screen tile. By the time the pixel processing pipeline has processed the third through ninth screen tiles, the pixel data shared between the first and tenth screen tiles may have already been evicted from the pixel cache to create room for pixel data associated with the intervening screen tiles. The pixel processing pipeline would then need to re-request the evicted pixel data from global memory, thereby thrashing the pixel cache to a certain degree. Depending on the size of the screen tiles and the order in which those tiles are processed, such thrashing may have a significant impact on the efficiency of the pixel cache.
Accordingly, what is needed in the art is a more efficient technique for caching pixel data in a tiled architecture.