1. Field of the Invention
The present invention relates to computer-based memory system, and, more particularly, cache residence prediction and its use in memory access filtering.
2. Description of the Related Art
In modern computer systems, caches are widely used to reduce memory access latencies. A symmetric multiprocessor (“SMP”) system generally employs a snoopy mechanism to ensure cache coherence. When a cache miss occurs, the requesting cache may send a cache request to the memory and all the peer caches. The term “peer cache” generally refers to a cache that is on the same snoopy network as the requesting cache. When a peer cache receives the cache request, it snoops its cache directory and produces a cache snoop response indicating whether the requested data is found in the cache and the state of the cache line that contains the requested data. A combined snoop response can be generated based on snoop responses from all the peer caches. If the requested data is found in a peer cache, the peer cache can source the data to the requesting cache via a cache intervention. The memory is responsible for supplying the requested data if the combined snoop response shows that the data cannot be supplied by any peer cache.
There are many protocols and techniques for achieving cache coherence that are known to those skilled in the art. A number of snoopy cache coherence protocols have been proposed. The MESI cache coherence protocol and its variations have been widely used in SMP systems. As the name suggests, MESI has four cache states, modified (M), exclusive (E), shared (S) and invalid (I).                I (invalid): The data is not valid. This is the initial state or the state after a snoop invalidate hit.        S (shared): The data is valid, and can also be valid in other caches. This state is entered when the data is sourced from the memory or another cache in the modified state, and the corresponding snoop response shows that the data is valid in at least one of the other caches.        E (exclusive): The data is valid, and has not been modified. The data is exclusively owned, and cannot be valid in another cache. This state is entered when the data is sourced from the memory or another cache in the modified state, and the corresponding snoop response shows that the data is not valid in another cache.        M (modified): The data is valid and has been modified. The data is exclusively owned, and cannot be valid in another cache. This state is entered when a store operation is performed on the cache line.        
With the MESI protocol, when a cache miss occurs, if the requested data is found in another cache and the cache line is in the modified state, the cache with the modified data supplies the data via a cache intervention (and writes the most up-to-date data back to the memory). However, if the requested data is found in another cache and the cache line is in the shared state, the cache with the shared data does not supply the requested data, since it cannot guarantee from the shared state that it is the only cache that is to source the data. In this case, the memory need to source the data to the requesting cache.
The IBM® Power 4 system, for example, enhances the MESI protocol to allow more cache interventions. Compared with MESI, an enhanced coherence protocol allows data of a shared cache line to be sourced via a cache intervention. In addition, if data of a modified cache line is sourced from one cache to another, the modified data does not have to be written back to the memory immediately. Instead a cache with the most up-to-date data can be held responsible for necessary memory update if the data is replaced from the cache eventually. An exemplary enhanced MESI protocol employing seven cache states is as follows.                I (invalid): The data is invalid. This is the initial state or the state after a snoop invalidate hit.        SL (shared, can be sourced): The data is valid, and may also be valid in other caches. The data can be sourced to another cache via a cache intervention. This state is entered when the data is sourced from another cache or from the memory.        S (shared): The data is valid, and may also be valid in other caches. The data cannot be sourced to another cache. This state is entered when a snoop read hit occurs on a cache line in the SL state.        M (modified): The data is valid, and has been modified. The data is exclusively owned, and cannot be valid in another cache. The data can be sourced to another cache. This state is entered when a store operation is performed on the cache line.        Me (exclusive): The data is valid, and has not been modified. The data is exclusively owned, and cannot be valid in another cache.        Mu (unsolicited modified): The data is valid, and is considered to have been modified. The data is exclusively owned, and cannot be valid in another cache.        T (tagged): The data is valid, and has been modified. The modified data has been sourced to another cache. This state is entered when a snoop read hit occurs on a cache line in the M state.        
In modern SMP systems, when a cache miss occurs, if the requested data is found in both the memory and another cache, supplying the data via a cache intervention is often preferred because cache-to-cache communication latency is usually smaller than memory access latency. Furthermore, cache-to-cache communication may have more available bandwidth when caches are on the same die or in the same module, while memory bandwidth can be more critical and contested resource.
When the memory controller receives a cache request of a cache miss, it cannot determine whether the requested data need to be retrieved from memory until the corresponding cache snoop operation completes (or partially completes). In modern SMP systems, a snoop operation may take tens or hundreds of cycles, especially when caches are connected with one another via an interconnect such as a ring instead of a bus. Under this situation, there are generally two alternatives for the memory controller to handle the cache request, namely lazy memory access and eager memory access.
With lazy memory access, the memory controller initiates no memory access until it determines from the corresponding snoop response that the requested data cannot be supplied by a peer cache. This can avoid unnecessary memory accesses, but may result in significant latency if it turns out that the requested data need to be retrieved from the memory.
With eager memory access, in contrast, the memory controller initiates the memory access immediately after it receives the cache request, even though the cache snoop response is not available at the time. If it turns out that the requested data can be supplied from another cache, either the cache or the memory can supply the data to the requesting cache (the data retrieved from the memory can be discarded in case of a cache intervention). Compared with lazy memory access, eager memory access can avoid unnecessary memory access latency, but may result in unnecessary memory bandwidth consumption when the requested data can be supplied from another cache. The problem becomes more serious as cache size increases, especially for applications that exhibit high cache-to-cache transfer ratios due to frequently communicated cache lines.
Therefore, it is desirable to have a cache residence prediction mechanism that can predict whether requested data of a cache miss can be supplied from another cache. The memory controller can therefore use the prediction result to determine whether it should initiate the memory access immediately when it receives the cache request. An effective cache residence prediction mechanism allows the memory controller to avoid unnecessary memory access latency and unnecessary bandwidth consumption.