1. Technical Field
The present invention relates to cache coherence mechanisms, and more particularly, to cache coherence in network-based multiprocessor systems with ring-based snoop response collection.
2. Description of Related Art
A symmetric multiprocessor (“SMP”) system employs a cache coherence mechanism to ensure cache coherence. Snoop-based cache coherence is a typical approach for implementing cache coherence for SMP systems. With snoop-based cache coherence, when a cache miss occurs, the requesting cache broadcasts a cache request to its peer caches. An appropriate cache snoop filtering mechanism can be used to reduce overhead due to cache coherence messages and cache snoop operations. Traditionally, snoop-based cache coherence is implemented in a bus-based SMP system in which caches communicate with each other via a shared bus. To avoid a potential communication bottleneck, a modern SMP system typically uses a message-passing network rather than a physically shared bus. Such SMP systems are referred to as network-based SMP systems.
Referring now to FIG. 1, an exemplary cache-coherent multiprocessor system is shown that comprises multiple nodes interconnected via an inter-node interconnect network, wherein each node comprises a central processing unit (“CPU”) and a cache. Also connected to the inter-node interconnect network are a memory and input/output (“IO”) devices. Although the memory is depicted as one component, the memory can be physically distributed into multiple memory portions, wherein each memory portion is operatively associated with a node.
Referring now to FIG. 2, another exemplary cache-coherent multiprocessor system is shown that comprises multiple nodes interconnected via an inter-node interconnect, wherein each node comprises a chip multiprocessor (“CMP”) subsystem. Each CMP subsystem comprises one or more caches that can communicate with each other via an intra-node fabric. A memory portion, as well as IO devices, can also be connected to the intra-node fabric.
With snoop-based cache coherence, when a read cache miss occurs, the requesting cache typically broadcasts a cache data request to its peer caches and to the memory. When a peer cache receives the cache data request, the peer cache performs a local cache snoop operation and produces a cache snoop response indicating whether the requested data is found in the peer cache and the state of the corresponding cache line. If the requested data is found in a peer cache, the peer cache may supply the data to the requesting cache via a cache-to-cache transfer. The memory is responsible for supplying the requested data if no peer cache can supply the data.
In a cache coherent SMP system, a cache request can be a cache data request that intends to obtain a shared copy of requested data, a cache data-and-ownership request that intends to obtain an exclusive copy of requested data, or an ownership request that intends to invalidate shared copies of requested data in other caches.
A number of techniques for achieving snoop-based cache coherence are known to those skilled in the art. For example, the MESI snoopy cache coherence protocol and its variants have been widely used in SMP systems. As the name suggests, MESI has four cache states: modified (M), exclusive (E), shared (S) and invalid (I). If a cache line is in an invalid state in a cache, the data is not valid in the cache. If a cache line is in a shared state in a cache, the data is valid in the cache and can also be valid in other caches. This state is entered, for example, when the data is retrieved from the memory or another cache, and the corresponding snoop responses indicate that the data is valid in at least one of the other caches. If a cache line is in an exclusive state in a cache, the data is valid in the cache, and cannot be valid in any other cache. Furthermore, the data has not been modified with respect to the data maintained in the memory. This state is entered, for example, when the data is retrieved from the memory or another cache, and the corresponding snoop responses indicate that the data is not valid in any other cache. If a cache line is in a modified state in a cache, the data is valid in the cache and cannot be valid in any other cache. Furthermore, the data has been modified as a result of a memory store operation, and the modified data has not been written to the memory.
With snoop-based cache coherence, when a cache miss occurs, if the requested data is found in both memory and another cache, supplying the data via a cache-to-cache transfer is often preferred because cache-to-cache transfer latency is typically smaller than memory access latency. For example, in the IBM® Power 4 system, when data of an address is shared in one or more caches in a multi-chip module, the cache with the last received shared copy can supply the data to another cache in the same multi-chip module via a cache-to-cache transfer.
A cache is referred to as a requesting cache of a cache request, if the cache request is originally generated from the cache. A cache is referred to as a snooping cache of a cache request, if the cache needs to be snooped in servicing the cache request. A cache is referred to as a supplying cache of a cache request, if the cache supplies requested data to the requesting cache.
Likewise, a node is referred to as a requesting node of a cache request, if the cache request is originally generated from a cache in the node. A node is referred to as a snooping node of a cache request, if at least one cache in the node needs to be snooped in servicing the cache request. A node is referred to as a supplying node of a cache request, if a cache in the node supplies requested data to the requesting node.
In a bus-based SMP system, the bus behaves as a central arbitrator that serializes all bus transactions to ensure a total order of bus transactions. In a network-based SMP system, messages can be received in different orders at different receiving caches. This lack of serialization guarantee of coherence messages makes it difficult to provide efficient cache coherence support. Therefore, a need exists for a mechanism that can efficiently support cache coherence in a network-based multiprocessor system.