Most cache coherency protocols have a shared state in which data can be shared between any number of system components (e.g., processors). The shared (S) state arises when a system component requests a read-only copy of the data and the data was already in an Exclusive (E) state in another system component. The requesting system component and the system component that had a copy of the data each mark the data in shared state. When data is in the shared state, that data can be freely copied by the system components requesting a read-only copy of the data.
In bus-based multiprocessor systems, cache coherency protocols generally do not permit a system component to provide the shared data to a requesting system component. Instead, the data is retrieved from the memory system directly. In directory-based cache coherency protocols, the memory system also provides a shared copy to the requesting system component. The directory of cache line states (and thus, data states) is located between the system components and memory, thus the data is retrieved from memory and sent to the requesting system component.
The shared state can present numerous issues in a multiprocessor system that uses a point-to-point interconnection network between the system components especially when it does not rely upon a directory for tracking cache line states. To limit the issues in such systems, prior art solutions suggest that requests from a system component are routed directly to the memory system and then the memory system is responsible for broadcasting the request to determine the data (cache line) state, collecting the responses from the other system components and then determining what state the data should be when the request is fulfilled. These protocols result in four hops for the data to be returned: 1) requestor to memory, 2) memory broadcast request to other system components, 3) system components respond to the memory system, and 4) the memory system forwarding the data to the requester.
To lower the latency associated with fulfilling request in a fully connected point-to-point system, the requesting system component can broadcast its request to all other system components as well as to the memory system. If another system component has the data in shared state, then it could directly deliver the data to the requestor. Complexities arise when multiple system components simultaneously request the same data and multiple other system components have the data in shared state. The requestor system component must deal with potentially multiple data returns. More issues arise when one or more system components request the right to modify a shared data.
When one of the system components wants to modify the data, that component must issue a “request-for-ownership” (RFO) asking permission from the rest of the system to modify the requested data. After RFO is granted, the state of the data is changed from shared to another state (e.g., modified) that indicates that the data has been modified.
To illustrate, FIGS. 1a and 1b are conceptual diagrams of a four node system having a prior art cache coherency protocol. In the example of FIG. 1b, dashed lines represent messages previously sent and solid lines represent messages being described. In this system, only the four traditional cache line states are used: modified (M), exclusive (E), shared (S), and invalid (I). This is known as the MESI cache coherency protocol. Nodes 110, 120 and 130 are peer nodes that store a copy of the requested data (e.g., a cache line) in cache memory. Home node 140 stores the original copy of the data in memory or modified versions of the data when the modifications are written back to memory. That is, home node 140 is responsible for the non-cached copy of the data. In the example of FIG. 1a, nodes 120 and 130 both have copies of the requested data stored in cache memory and the data is in a shared (S) state.
When peer node 110 issues an RFO requesting the right to modify the data, peer node 110 broadcasts an RFO to the other nodes of the system. As illustrated in FIG. 1b, nodes 120 and 130 both respond to the request from peer node 110 by providing a copy of the requested data. Because both nodes 120 and 130 are capable of providing copies of requested data, peer node 110 must be capable of receiving and reconciling multiple copies of requested data. This adds complexity to the design of peer node 110. As the number of nodes in a system increases, this requirement further increases in complexity, which increases the cost and difficulty of system design.
More complex situations can also exist where multiple requesting nodes can each receive a copy of the requested data, for example, from three or more nodes. Thus, each node must be capable of resolving the multiple conflicting copies of requested data in order to ensure proper system functionality.