Field of the Invention
Embodiments of the present invention relate generally to memory access operations and, more specifically, to load/store operations in texture processing hardware.
Description of the Related Art
A modern graphics processing unit (GPU) includes texture processing hardware configured to perform a variety of texture-related operations, including texture load operations and texture cache operations. The texture processing hardware accesses surface texture information from the texture cache under varying circumstances, such as while rendering object surfaces in a three-dimensional (3D) graphics scene for display on a display device, while rendering a two-dimensional (2D) graphics scene, or during compute operations. Surface texture information includes texture elements (texels) used to texture or shade object surfaces in a 3D graphics scene. Typically, the texture processing hardware and associated texture cache are optimized for efficient, high throughput read-only access to support the high demand for texture information during graphics rendering, with little or no support for write operations. Further, the texture processing hardware includes specialized functional units to perform various texture operations, such as level of detail (LOD) computation, texture sampling, and texture filtering.
Such a GPU typically includes a separate level 1 (L1) cache to store and load variable and constant data from local and global memory. This variable and constant data may be pre-fetched and loaded into the L1 cache from a higher level cache or from system memory. The GPU may read from and write to the L1 cache to access this variable and constant data. Data altered by the GPU may later be stored back into a higher level cache or to system memory. The L1 cache does not typically include the specialized functionality of the texture processing hardware and texture cache, such as LOD computation and texture filtering.
One drawback with this approach is that amount of die area and power consumption needed to support two separate cache controllers and associated memory typically exceeds the surface area and power consumption of a single cache memory. Another drawback with this approach is that the resources needed to support texture operations directed to the texture cache and load/store operations directed to the L1 cache may vary significantly during rendering of a given 3D graphics scene. Accordingly, the texture cache and the L1 cache are each designed to support a varying memory consumption level and load/store bandwidth, leading to potential inefficient usage of the texture cache and the L1 cache.
As the foregoing illustrates, what is needed in the art is a more efficient technique for implementing cache memory in a graphics processing unit.