In many multiprocessor systems, memory devices are organized in hierarchies including main memory and one or more levels of cache memory. Data can reside in one or more of the cache levels and/or main memory. Cache coherence protocols are used in multiprocessor systems to address the potential situation where not all of the processors see the same data value for a given memory location.
Memory systems are said to be coherent if they see memory accesses to a single data location in order. This means that if a write access is performed to data location X, and then a read access is performed to the same data location X, the memory hierarchy should return X regardless of which processor performs the read and write and how many copies of X are present in the memory hierarchy. Likewise, coherency also typically requires that writes be performed in a serialized manner such that each processor sees those write accesses in the same order.
There are various types of cache coherency protocols and mechanisms. For example, “explicit invalidation” refers to one mechanism used by cache coherence protocols wherein when a processor writes to a particular data location in a cache then all of the other caches which contain a copy of that data are flagged as invalid by sending explicit invalidation messages. An alternative mechanism is updating wherein when a processor writes to a particular data location in a cache, then all of the other caches which contain a copy of that data are updated with the new value. Both of these cache coherence mechanisms thus require a significant amount of signaling, which scales with the number of cores (or threads) which are operating in a given data processing system. Accordingly, these various cache protocols and mechanisms are known to have their own strengths and weaknesses, and research continues into improving cache coherency protocols with an eye toward maintaining (or improving) performance while reducing costs (e.g., energy consumption) associated with coherency traffic.
For example, recently a number of proposals have been set forth which aim to simplify coherence by relying on data-race-free semantics and on self invalidation to eliminate explicit invalidation traffic and the need to track readers at the directory. The motivation for simplifying coherence has been established in numerous articles, some of which are mentioned herein. For example, with the addition of self-downgrade, the directory can be eliminated, see, e.g., A. Ros and S. Kaxiras, “Complexity-effective multicore coherence,” in 21st International Conference on Parallel Architectures and Compilation Techniques (PACT), 2012, and virtual cache coherence becomes feasible at low cost, without reverse translation, see, e.g., S. Kaxiras and A. Ros, “A new perspective for efficient virtual-cache coherence,” in 40th International Symposium on Computer Architecture (ISCA), 2013. Significant savings in area and energy consumption without sacrificing performance, have also been demonstrated. Additional benefits regarding ease-of-verification, scalability, time-to-market, etc., are possible as a result of simplifying rather than complicating such fundamental architectural constructs as coherence.
In self-invalidation cache coherence protocols, writes on data are not explicitly signaled to sharers as is the case with explicit invalidation cache coherence protocols. Instead, a processor automatically invalidates its locally stored cache copy of the data. However, data races throw such self-invalidation protocols into disarray, producing non-sequential-consistent executions, see, e.g., A. R. Lebeck and D. A. Wood, “Dynamic self-invalidation: Reducing coherence overhead in shared-memory multiprocessors,” in 22nd International Symposium on Computer Architecture (ISCA), 1995. As will be appreciated by those skilled in the art, a data race occurs when: two or more threads access the same memory location concurrently, and. at least one of the accesses is for writing, and the threads are not using any exclusive locks to control their accesses to that memory location. All such proposals seen thus far offer sequential consistency for data-race-free (DRF) programs, see, e.g., S. V. Adve and M. D. Hill, “Weak ordering—a new definition,” in 17th International Symposium on Computer Architecture, 1990.
Data-race-free semantics require that conflicting accesses (e.g., a read and a write to the same address from different cores or processors) must be separated by synchronization (perhaps transitive over a set of threads). Self-invalidation is therefore initiated on synchronization. This synchronization must be exposed to the coherence mechanisms by the software, i.e., existing self-invalidation coherence protocols require cooperation with the application software running on the system. However this requirement increases the complexity of the software and runs the risks of errors occurring if proper cooperation between the self-invalidation coherence protocols and the software is not provided.
Accordingly, it would be desirable to provide systems and methods for cache coherence that do not require software to expose synchronization.