1. Field of the Invention
This invention is related to the field of caches and, more particularly, to replacement mechanisms in caches.
2. Description of the Related Art
Processors typically provide a set of registers which may be used by programs as a high speed, private storage for operands. The operands stored in registers may be frequently accessed variables, or may be intermediate results in a complex, multi-instruction calculation. Unfortunately, for many tasks, the number of registers provided in the processor may be too few to hold all of the operands of interest. In such cases, many of the frequently accessed variables and/or intermediate results are written to and read from memory locations during execution of the program.
The memory may have a significantly higher latency than the registers, limiting the speed at which the program may be executed as compared to the speed that may be achieved if all operands were in registers. Processors and/or the computer systems including the processors may provide caches to alleviate the memory latency. Generally, a cache is a relatively small, high speed memory which may store copies of data corresponding to various recently-accessed memory locations. Generally, cache storage is allocated and deallocated in units of cache blocks (a group of bytes from contiguous memory locations). In other words, the cache may include multiple entries, and each entry may include storage for a cache block of bytes. If requested data for an access is not in the cache (a “miss”), an entry is allocated for the cache block including the requested data and the cache block is filled into the allocated entry.
Caches are a finite resource, and thus may be susceptible to “cache thrashing” or “cache pollution” effects. These effects generally refer to frequently replacing the same data in the cache due to the access patterns to the cache. For example, if the amount of data being accessed by a given process exceeds the size of the cache (or maps to a subset of the cache that is smaller, in size, than the amount of data), accessing a first portion of the data may cause a second portion to be replaced in the cache. Subsequently accessing the second portion may cause the first portion to be replaced, and so on.
If multiple processors (executing different processes) access a given cache, then cache blocks accessed by one processor may cause cache blocks accessed by another processor to be replaced. Both processors may be accessing data sets that might otherwise remain in the cache, but the interference between the processors may cause the amount of data retained in cache to be less than optimal.
In some systems, multiple nodes (each including processors and caches) may form a distributed memory system. Memory attached to a given node may be local to that node, while memory attached to other nodes may be remote to the given node. Latency for accessing remote memory in the given node may be higher than accessing the local memory, in general, due to the communication needed with the other nodes (e.g. cache-coherent nonuniform memory access (CC-NUMA) systems may be employed). If data from the remote memory is cached in the cache, that data is subject to replacement by local data that is subsequently accessed. If the remote data is subsequently accessed again in the node, the higher latency to access the remote memory may be experienced.