A simple way to increase the speed of a computer system is to increase the clock speed of its processor. However, when the clock speed is increased, the processor may stall and wait for data from main memory to continue processing.
In order to reduce memory access time in a typical computer system, special purpose high-speed memory spaces of static random access memory (RAM) called a “cache” are used to temporarily store data which are currently in use. For example, the cached data can include a copy of instructions and/or data obtained from main memory for quick access by a processor. A processor cache typically is positioned near or integral with the processor. Data stored in the cache advantageously may be accessed by the processor in a single processor cycle retrieving the data necessary to continue processing; rather than having to stall and wait for the retrieval of data from main memory.
When the processor requests a data item from main memory, the cache is accessed when the processor processes a memory access instruction. If the desired item, for example, data or program instruction, resides in the processor cache, this is called a cache “HIT” and the desired cache line is supplied to the processor immediately. If the desired data or program instruction is not found in the cache, this is a called cache “MISS”. With a cache MISS, secondary memory (i.e., main memory storage) is accessed to read that item, and the data item requested is transferred from the main memory to the cache and the processor. A cache MISS causes the processor to wait or creates a stall, degrading system performance.
Various techniques are known for mapping physical main memory addresses into the processor cache memory locations, including a direct mapping cache configuration, a set-associative cache configuration, and a fully associative cache configuration. In addition, several cache line replacement algorithms are also known to replace or discard data from the processor cache when making room for new data. Examples include Round-Robin, First-in First-out (FIFO), and Least-Recently-Used (LRU) algorithms. The Round-Robin mechanism simply replaces cache lines in a sequential order. The FIFO mechanism determines which cache line is the first one saved, and that cache line is to be overwritten. The LRU algorithm attempts to identify which cache line is the least recently used, and that cache line is to be overwritten.
In a multi-node processor system, however, private processor caches may contain multiple copies of a given data item from main memory. All of these copies must be kept consistent (coherent); otherwise, data may be staled and effective access times can be reduced.
One recent solution to keep the private processor caches coherent in such a multi-processor system is to use a “Snoop Filter” implemented to manage information related to the cache line for cache coherency. A “Snoop Filter” is similar to a processor cache in that both the “Snoop Filter” and the processor cache can be organized as direct mapping, associative and set-associative caches. However, where a processor cache line contains data, the “Snoop Filter” line contains information related to the cache line in the multi-processor system (state and where the cache line is cached). In addition, where a processor cache has perfect knowledge of memory accesses of one or more processors, the “Snoop Filter” has imprecise knowledge of the cache lines in various processor caches. Nevertheless, there is no existing replacement algorithms that can be implemented in the “Snoop Filter” to replace or update the least recently used cache lines that are not in the processor caches to reflect the lines that are replaced in the processor caches.
Therefore, there is a need for the “Snoop Filter” to implement a Pseudo-Least-Recently-Used (PLRU) replacement algorithm to effectively update and reflect invalid entries in the “Snoop Filter” cache.