Multi-core (e.g. multi-processor) systems may comprise a plurality of cache nodes used to store data, such as processing instructions and frequently referenced data in main memory locations and/or other cache levels. One or more copies of data (e.g. a cache line) that references a particular main memory location may be stored within different locations in the multi-core system. For example, a data value of “0” associated with main memory address “0” may be stored in each of the cache nodes within the multi-core systems. When one copy of the data is modified in one of the cache nodes, a cache coherence mechanism may modify other copies located in other cache nodes. The cache coherence mechanism may maintain the consistency of data stored within many different cache nodes by propagating changes in the data throughout the multi-core system.
Two types of cache coherence mechanisms that may be used within a multi-core system are snooping-based coherence and directory-based coherence. In snooping-based coherence, requests to modify data (e.g. a write instruction) may be broadcast by each of the cache nodes. Other cache nodes may monitor the requests sent out and determine whether the received requests correspond to data stored within their own cache memory. Although snooping-based coherence may have short latency periods, the systems that implement snooping-based coherence may suffer from bandwidth and scalability problems caused by constant broadcast of data. Alternatively, in directory-based coherence, data that is being shared amongst one or more cache nodes may be stored in a home node. The home node may maintain the coherence between cache nodes within a multi-core system using a directory. When data is changed in one cache node, the home node may update or invalidate entries in other cache nodes that store the data. Cache nodes may also send a request for permission to the home node prior to loading data from the main memory. As a result, in comparison to snooping-based coherence, directory-based coherence improves scalability and bandwidth concerns, but suffers from longer latencies caused by the constant access of the home node.
To reduce latency within directory-based coherence, cache coherency protocols, such as Modified Owned Exclusive Shared Invalid (MOESI) protocol or Modified Exclusive Shared Invalid Forward (MESIF) protocol may be used to implement peer-to-peer cache forwarding. Peer-to-peer cache forwarding occurs when one of the cache nodes is used to forward the requested data to another cache node. Instead of the home node receiving the requested data from the designated cache node and subsequently responding to the request, the designated cache node directly responds to the request. In the MOESI protocol, the cache node designated to hold the data and respond to request for the data may be designated with an “Owned” state, while the MESIF protocol may use a “forward” state to designate the cache node. In both MOESI and MESIF protocols, the designated cache nodes are responsible for responding to requests from other cache nodes for a particular data (e.g. cache line).
Unfortunately, the MOESI and MESIF protocols lack flexibility in selecting cache nodes designated to respond to request from other cache nodes. For instance, the MESIF protocol designates the cache node that requested the data most recently with the “forward” state. For the MOESI protocol, the “Owned” state is designated for the cache node that stores the more recent, correct copy of the data. Therefore, both the MOESI and MESIF protocols do not dynamically select cache nodes based on performance factors, such as location of the requesting cache node and the current workload being processed at the designated cache node. Hence, in some instances, the designated cache node may become a processing bottleneck and cause performance degradation within a multi-core system. Therefore, a solution is needed to dynamically select a cache node to satisfy requests within the multi-core system.