Large computer software applications, such as simulators and database servers, require cost-effective computation beyond that which can be provided by a single microprocessor. Shared-memory, multiprocessor computers have emerged as a popular solution for running such applications. Most shared memory multiprocessor computers provide each constituent processor with a cache memory into which portions of the shared memory (“blocks”) may be loaded. The cache memory allows faster memory access.
A cache coherence protocol ensures that the contents of the cache memories accurately reflect the contents of the shared memory. Generally, such protocols invalidate all other caches when one cache is written to, and update the main memory before a changed cache is flushed.
Two important classes of protocols for maintaining cache coherence are “directories” and “snooping”. In the directory protocols, a given “node” typically being a cache/processor combination, “unicasts” its request for a block of memory to a directory which maintains information indicating those other nodes using that particular memory block. The directory then “multicasts” requests for that block directly to a limited number of indicated nodes. Generally, the multicast will be to a superset of the nodes greater than the number that actually have ownership or sharing privileges because of transactions which are not recorded in the directory, as is understood in the art. The “indirection” of directory protocols, requiring messages exchanged with the directory prior to communication between processors, limits the speed of directory protocols.
The problem of indirection is avoided in snooping protocols where a given cache may “broadcast” a request for a block of memory to all other “nodes” in the system. The nodes include all other caches and the shared memory itself. The node “owning” that block responds directly to the requesting node, forwarding the desired block of memory.
Snooping, however, requires that “message ordering” be preserved on the interconnection between communicating nodes. Generally this means each node can unambiguously determine the logical order in which all messages must be processed. This has been traditionally guaranteed by a shared wire bus. Without such ordering, for example, a first node may ask for a writeable copy of a block held by memory at the same time that it sends messages to other nodes invalidating their copies of the block in cache for reading. A second node receiving the invalidation message may ignore it because the second node does not have the block, but then the second node may request the block for reading before the first node receives the block from memory for writing. When the first node finally does receive the block, the second node erroneously believes it has a readable copy.
The “correctness” of memory access in snooping is tightly linked to this requirement of a message ordering in the communications between processors. This and other requirements of the snooping protocol complicate any modifications of snooping to increase its performance.