Shared memory multiprocessors often use a cache with each processor to reduce memory latency, and to avoid contention on the network between the processors and main memory. In such a system, there must be some mechanism provided to allow programs running in different processors to have a consistent view of the state of the shared memory, even though they may all write to the same location simultaneously. That is, it is necessary to ensure that two processors reading the same address from their caches will see the same value. Most schemes for maintaining this consistency, known as cache coherency, use snoopy caches, directories, or software techniques.
Snoopy cache methods are the most commonly used. In snoopy cache systems, each cache must observe all read and write traffic on the bus which interconnects the processors. A snoopy cache controller listens to transactions between main memory and the other caches, and updates its state based on what it hears. The nature of the update varies from one snoopy cache scheme to another. For example, on hearing that some caches modified the value of a block, the other caches could either invalidate or update their own copy. Because all caches in the system must observe the memory transactions, a shared bus is the typical medium of communication.
Because the caches must also satisfy read requests from other processors for which they have the most recent value, the cache memory must be dual ported. Reads and writes must be permitted both from the processor side of the cache and from the shared bus side. For high performance systems in which the reference rate from the processor is high, either the tag store of the cache must be duplicated, or a significant cycle-stealing penalty must be accepted as bus accesses to the cache interfere with processor accesses.
Snoopy caches provide an illusion of truly shared global memory. This makes the method very difficult to expand to more than a few processors connected by a single shared bus. The fundamental limitation is that when a processor writes a shared datum in a snoopy bus scheme, that data must propagate to all caches in the system in a single cycle. If this were not the case, two processors could succeed in writing different values to the same datum simultaneously, violating the requirement of cache coherence.
Another class of techniques associates a directory entry with each block of main memory; the entry records the current location of each memory block. Memory operations query the directory to determine whether cache coherence actions are necessary.
Both snoopy cache and directory schemes involve increased hardware complexity. However, the caches are invisible at the software level which greatly simplifies machine programming.
As an alternative, cache coherence can be enforced in software, trading software complexity for hardware complexity. Software schemes are attractive not only because they require minimal hardware support, but also because they scale beyond the limits imposed by the bus.