1. Field of the Invention
The present invention relates to computer-based memory systems, and, more particularly, location-aware cache-to-cache transfers.
2. Description of the Related Art
A symmetric multiprocessor (“SMP”) system generally employs a snoopy mechanism to ensure cache coherence. When a cache miss occurs, the requesting cache may send a cache request to the memory and all its peer caches. When a peer cache receives the cache request, it snoops its cache directory and produces a cache snoop response indicating whether the requested data is found and the state of the corresponding cache line. If the requested data is found in a peer cache, the peer cache can source the data to the requesting cache via a cache-to-cache transfer. The memory is responsible for supplying the requested data if the data cannot be supplied by any peer cache.
For purposes of this disclosure, a cache-to-cache transfer is referred to as a cache intervention. In a cache intervention, the cache that requests data is referred to as the requesting cache, and the cache that supplies data is referred to as the supplying cache or the sourcing cache. A cache is said to have intervention responsibility for a memory address to a peer cache, if the cache is responsible for supplying requested data of the memory address to the peer cache.
Referring now to FIG. 1, an exemplary SMP system 100 is shown that includes multiple processing units 105 interconnected via an interconnect network 110. Each processing unit 105 includes a processor core 115 and a cache 120. Also connected to the interconnect network 110 are a memory 125 and some I/O devices 130. The memory 125 can be physically distributed into multiple memory portions, wherein each memory portion is operatively associated with a processing unit 105. The interconnect network 110 serves at least two purposes: (1) sending cache coherence requests to the caches 120 and the memory 125; and (2) transferring data among the caches 120 and the memory 125. The interconnect network 110 can employ different physical networks for different purposes. For example, an SMP system can broadcast a cache request via direct point-to-point communication channels, and transfer data via a message-passing network such as a mesh or torus network.
There are many techniques for achieving cache coherence that are known to those skilled in the art. A number of snoopy cache coherence protocols have been proposed. The MESI coherence protocol and its variations have been widely used in SMP systems. As the name suggests, MESI has four cache states, modified (M), exclusive (E), shared (S) and invalid (I).                I (invalid): The data is not valid. This is the initial state or the state after a snoop invalidate hit.        S (shared): The data is valid, and can also be valid in other caches. This state is entered when the data is sourced from the memory or another cache in the modified state, and the corresponding snoop response shows that the data is valid in at least one of the other caches.        E (exclusive): The data is valid, and has not been modified. The data is exclusively owned, and cannot be valid in another cache. This state is entered when the data is sourced from the memory or another cache in the modified state, and the corresponding snoop response shows that the data is not valid in another cache.        M (modified): The data is valid and has been modified. The data is exclusively owned, and cannot be valid in another cache. This state is entered when a store operation is performed on the cache line.        
With the MESI protocol, when a cache miss occurs, if the requested data is found in another cache and the cache line is in the modified state, the cache with the modified data supplies the data via a cache intervention, and writes the most up-to-date data back to the memory. However, if the requested data is found in another cache and the cache line is in the shared state, the cache with the shared data does not supply the requested data. In this case, the memory needs to supply the data to the requesting cache.
In modern SMP systems, when a cache miss occurs, if the requested data is found in both the memory and a cache, supplying the requested data to the requesting cache via a cache intervention is often preferred over supplying the requested data to the requesting cache from the memory, because cache-to-cache transfer latency is usually smaller than memory access latency. Furthermore, when caches are on the same die or in the same package module, there is usually more bandwidth available for cache-to-cache transfers, compared with the bandwidth available for off-chip DRAM accesses.
The IBM® Power 4 system, for example, enhances the MESI coherence protocol to allow more cache interventions. Compared with MESI, an enhanced coherence protocol allows data of a shared cache line to be sourced via a cache intervention. In addition, if data of a modified cache line is sourced from one cache to another, the modified data does not have to be written back to the memory immediately. Instead, a cache with the most up-to-date data can be held responsible for memory update when it becomes necessary to do so. An exemplary enhanced MESI protocol employing seven cache states is as follows.                I (invalid): The data is invalid. This is the initial state or the state after a snoop invalidate hit.        SL (shared, can be sourced): The data is valid, and may also be valid in other caches. The data can be sourced to another cache in the same module via a cache intervention. This state is entered when the data is sourced from another cache or from the memory.        S (shared): The data is valid, and may also be valid in other caches. The data cannot be sourced to another cache. This state is entered when a snoop read hit occurs on a cache line in the SL state.        M (modified): The data is valid, and has been modified. The data is exclusively owned, and cannot be valid in another cache. The data can be sourced to another cache. This state is entered when a store operation is performed on the cache line.        Me (exclusive): The data is valid, and has not been modified. The data is exclusively owned, and cannot be valid in another cache.        Mu (unsolicited modified): The data is valid, and is considered to have been modified. The data is exclusively owned, and cannot be valid in another cache.        T (tagged): The data is valid, and has been modified. The modified data has been sourced to another cache. This state is entered when a snoop read hit occurs on a cache line in the M state.        
When data of a memory address is shared in multiple caches in a single module, the module can include at most one cache in the SL state. The cache in the SL state is responsible for supplying the shared data via a cache intervention when a cache miss occurs in another cache in the same module. At any time, the particular cache that can source the requested data is fixed, regardless of which cache has issued the cache request. When data of a memory address is shared in more than one module, each module can include a cache in the SL state. A cache in the SL state can source the data to another cache in the same module, but cannot source the data to a cache in a different module.
In systems in which a cache-to-cache transfer can take multiple message-passing hops, sourcing data from different caches can result in different communication latency and bandwidth consumption. When a cache miss occurs in a requesting cache, if requested data is found in more than one peer cache, a peer cache that is closest to the requesting cache is preferred to supply the requested data to reduce communication latency and bandwidth consumption of cache intervention.
Thus, it is generally desirable to enhance cache coherence mechanisms with cost-conscious cache-to-cache transfers to improve overall performance in SMP systems.