Field of the Invention
Embodiments of the present invention relate generally to processing memory transactions and, more specifically, to a unified cache for diverse memory traffic.
Description of the Related Art
A conventional graphics processor includes a collection of processing cores. Each processing core may perform memory transactions with main memory and cache associated data in a level one (L1) cache. Typically, a tag store manages the data stored in the L1 cache.
Each processing core in a conventional graphics processor may include a texture processing pipeline. The texture processing pipeline reads textures from memory, performs various operations with those textures, and then maps the processed textures to graphics objects. Texture processing pipelines may cache texture data in a texture cache. The texture cache is usually coupled to some form of tag store for managing the texture data, similar to the tag store mentioned above in relation to the L1 cache.
In addition to the L1 cache and the texture cache described above, a processing core within a conventional graphics processor may also include a scratchpad memory that allows parallel threads executing on the processing core to temporarily store data. The scratchpad memory may also provide a communication channel that allows threads to exchange data with one another.
As a general matter, conventional parallel processors include a number of different memory areas designed for different purposes, as described above. This distributed architecture is inefficient because each memory area requires similar logic, resulting in duplicate tag stores and/or duplicate crossbars. One approach to solving this problem involves combining the shared memory and the L1 cache. Another approach involves combining the L1 cache and the texture cache. However, both of these approaches still require distributed memory areas and duplicate logic.
As the foregoing illustrates, what is needed in the art is a more effective cache subsystem architecture.