Modern computer systems include distributed cache memories to speed access to memory shared among multiple components in a system. The shared memory systems that include cache memories typically utilize a cache coherency protocol such as MOESI, MESI, MESIF, or other related cache coherency protocol. As will be understood by one of ordinary skill, under these protocols, cache lines may be assigned (and transition between) various different states, such as “Modified” (“M”), “Exclusive” (“E”), “Shared” (“S”), “Invalid” (“I”), “Owned” (“O”), and “Forward” (“F”). The protocols are designed to arbitrate shared memory utilization in a coherent and consistent manner among multiple components in the presence of distributed caches memories.
Shared memory can be logically organized into units called cache lines. Copies of a particular cache line may be present in multiple components' local cache memories. In many implementations of cache coherency, to maintain coherency and consistency, the protocols require that a component intending to write to a cache line first notify all other components (or a directory) in the system of the component's intent to write to the cache line and then confirm that the component has the only writable copy of the cache line in question. Put differently, the component must gain “Modified” (also commonly referred to as “Dirty”) or “Exclusive” (also commonly referred to as “Valid”) state on its own local copy of the cache line. In the research literature, this technique is commonly called “invalidation.” Note that invalidation may be in the form of explicit invalidation or implied in actions such as, but not limited to, read for exclusive control. Modified (“M”) and Exclusive (“E”) states share a property—the writer with a local copy of a cache line in those states is the only component in the system that has permission to write to the cache line if the system's shared memory is to stay coherent and consistent.
When writing is initiated and the writer's local copy of the relevant cache line is not already in the M or E state, the write is delayed by “coordination overhead” wherein the system expends time and resources granting M or E state to the writer's local copy of the cache line. This coordination overhead therefore increases “Observed Latency” (i.e., the time which elapses between when the writer initiates a write to a cache line and when the data is permitted to be read from that cache line). In the related art, a writer can pre-emptively invalidate all remote copies so as to hide the coordination overhead therein reducing Observed Latency. For many workflows, pre-emptive invalidation (i.e., “write prefetch”) is effective. However, once a reader requests a cache line that has been invalidated pre-emptively and the request is granted prior to the writer initiating its write, the pre-emptive invalidation becomes wasted work because granting the reader's request moves the writer's copy of the cache line out of M or E state. In this scenario, the writer must again incur the coordination overhead at a future time when it wants to initiate a write and consequently will have wasted resources in the system in its unused pre-emptive invalidation. Generally, pre-emptive invalidation approaches known in the related art result in wasted work and therefore is not implemented.
Embodiments of the disclosed technology address the issues mentioned above and lowers the latency experienced in data transfers within a shared memory architecture in the presence of distributed cache memories.