Many computers can improve access times to memory (e.g., RAM or random access memory) by using additional high-performance hardware memory caches that are local to individual compute nodes. Certain computers may be built with a hardware cache coherency domain. In such a design, multiple compute nodes can rely on hardware mechanisms to ensure that cached write-updates to a local memory of one compute node can be seen by other compute nodes on subsequent cached read-accesses to these duplicated cache memory locations.
Computers can also be built with multiple cache coherency domains. In this type of design, multiple compute nodes cannot rely on hardware mechanisms. As a result, caching may be prevented altogether, which can reduce average memory access performance. However, computing applications might use additional means (e.g., explicit cache flushes) to ensure data consistency. In the case where a computer design may ensure data consistency, sharing memory between compute nodes on such designs can become complicated because software programs may use additional means to ensure data consistency and these additional means can create additional computing latency.
If software components are being re-used, then sharing memory may become unworkable because existing software components typically cannot be extended by the measures used to ensure data consistency. As a consequence of having re-usable data components that are not able to be extended to ensure data consistency at a software level, data used by multiple compute nodes may need to be stored multiple times on a computer, which can consume more memory resources than a shared memory scenario where the data can be stored just once. For example, an existing technique that can provide the abstraction of shared memory to applications across multiple computers is called distributed shared memory (DSM). However, one implementation of DSM can store multiple copies of the same data at multiple compute nodes, so each of the compute nodes has efficient read accesses.
Such solutions often rely on additional memory being available or hardware concurrency mechanisms that may make the hardware more expensive to manufacture. In the addition, the use of additional coherence protocols can make the hardware protocols and software protocols more complicated and slower.