Typical cache designs provide a dedicated area of memory to separately store a subset of a larger portion of data in memory. By storing data that is likely to be accessed again in the future in the dedicated area of memory, which may be more quickly or otherwise more efficiently accessed, overall efficiency of data access may be greatly improved. An underlying assumption of typical cache designs is locality of access, which refers to the likelihood that data accessed at one point in time is likely to be accessed again. If the subset of data stored in the dedicated memory area is likely to be accessed again in the future, the cache may be capable of achieving high levels of efficiency. On the other hand, if the subset of data stored in the dedicated memory area is not likely to be accessed again in the future, the cache is unlikely to achieve an acceptable measure of efficiency.
In many computer systems, only a limited amount of dedicated memory area may be available for implementing a cache system. Depending on the nature of the data to be accessed, the limited amount of dedicated memory area may be insufficient to provide an efficient cache system following traditional cache designs. For example, one type of data that has potential for utilizing efficient caching is graphics data such as texel data to be accessed from memory by a graphics processing system and rendered on a display. From one frame to the next, there may be a high degree of locality of access. In other words, a high number of the memory locations accessed to retrieve texel data for the current frame rendered on the display may be accessed again to retrieve the same texel data for the next frame rendered on the display. This may often be the case, for instance, in situations where the rendered image remains largely unchanged from one frame to the next. Such locality of access from one frame to the next frame presents a potential for implementation of an efficient cache system.
However, a prohibitively large amount of dedicated memory area may be required to exploit such locality of access, when traditional cache designs are utilized. In this example, the locality of access exists across frames. That is, a piece of texel data that is currently accessed is likely to be accessed again, but not until the next frame. Here, a traditional cache design that updates cache memory with the most recently accessed data may require enough dedicated memory area to provide caching for a full frame worth of texel data accesses, in order for the cache to perform properly. Otherwise, the cache may run out of memory space and begin overwriting useful cache entries stored from the current frame, before those cache entries are ever accessed in the next frame. Thus, cache entries that would have produced “hits” (a data access request that result in a match in the cache) in such a system may be destroyed prematurely, leading to an extremely low “hit rate” (ratio of data access requests that result in a match in the cache).
FIG. 1 is a block diagram of an illustrative computer system 100 containing memory components for which efficient data caching may be employed. As shown, computer system 100 includes a graphics card 102, a central processing unit (CPU) 104, a chipset comprising a northbridge chip 106 and a southbridge chip 108, system memory 110, PCI slots 112, disk drive controller 114, universal serial bus (USB) connectors 116, audio CODEC 118, a super I/O controller 120, and keyboard controller 122. As shown in FIG. 1, graphics card 102 includes a graphics processing unit (GPU) 124 and local memory 126. Also, graphics card 102 is connected to a display 128 that may be part of computer system 100. Here, GPU 124 is a semiconductor chip designed to perform graphics processing operations associated with rendering an image that may be presented on display 128.
Data residing in local memory 126 may be used as input data in the graphics rendering process, which produces a final image for presentation on display 128. Alternatively or additionally, data residing in system memory 110 may also be used as input data in the graphics rendering process. These accesses to memory performed by GPU 124 may be associated with significant latencies that impact the performance of the system. It may thus be desirable to provide a data caching system so that GPU 124 may access such data in a more efficient manner.
However, as discussed above, usage of the large amount of dedicated memory area required for caching data using traditional cache designs may simply be impracticable. For example, a typical graphics processing unit implemented as a semiconductor chip, such as GPU 124, may have a limited amount of on-chip memory. This may be the case due to a variety of factors, such as manufacturing cost. The amount of dedicated memory area required to provide caching for a full frame worth of texel data accesses, for instance, may simply be too large to fit within the limited on-chip memory associated with the graphics processing unit. One alternative may be to forego the advantages of caching and design the system to accommodate deficiencies such as higher latencies associated with memory accesses without caching. Such a system is likely to incur high area costs associated with the accommodation of high access latency. Another alternative may be to simply implement a traditional cache design using the limited amount of memory area available, even though it may be insufficient to fully exploit the temporal locality of access of the underlying data. As previously mentioned, this likely leads to an inefficient cache characterized by an extremely low hit rate. Such a system is also likely to have inferior memory access performance.
Thus, there is an urgent need for an improved cache design capable of utilizing a limited amount of memory area to achieve efficient data caching.