The present invention relates generally to a system for coordinating cache memories in a shared-memory computer architecture, and in particular, to a system that chooses a mechanism for communicating cache coherence messages based on the bandwidth available for transmitting such messages.
Large computer software applications, such as simulators and database servers, require cost-effective computation beyond that which can be provided by a single microprocessor. Shared-memory, multiprocessor computers have emerged as a popular solution for running such applications.
Most shared memory multiprocessor computers provide each constituent processor with a cache memory into which blocks of the shared memory may be loaded. The cache memory allows faster memory access. A coherence protocol ensures that the contents of the cache memories accurately reflect the contents of the shared memory. Generally, such protocols invalidate all other cache memories when one cache is written to, and updating of the main memory before a changed cache is flushed.
Two important classes of protocols for maintaining cache coherence are “snooping” and “directories”. In the snooping protocols, a given cache, before its processor reads or writes to a block of memory, “broadcasts” a request for that block of memory to all other “nodes” in the system. The nodes include all other caches and the shared memory itself. The node “owning” that block responds directly to the requesting node, forwarding the desired block of memory. A refinement of snooping, is “multicast snooping”, in which the requesting node attempts to predict which of the other nodes has a copy of the desired block, and rather than broadcasting its request, the requesting node performs a multicast to the predicted copy holders. This technique is described in Multicast Snooping: A New Coherence Method Using a Multicast Address Network, E. Ender Bilir, Ross M. Dickson, Ying Hu, Manoj Plakal, Daniel J. Sorin, Mark D. Hill, and David A. Wood, International Symposium on Computer Architecture (ISCA), 1999, hereby incorporated by reference.
In the directory protocols, a given cache “unicasts” its request for a block of memory to a directory which maintains information indicating those other caches using that particular memory block. The directory then “multicasts” requests for that block directly to a limited number of indicated caches. Generally, the multicast will be to a superset of the caches, over those that actually have ownership or sharing privileges, because of transactions which are not recorded in the directory, as is understood in the art.
Snooping protocols are often used with small computers because they transmit the necessary cache messages quickly without the delaying intermediate step of using the directory. For large systems with many processors, however, snooping generates large numbers of messages which may overwhelm a communications channel. For this reason, the directory protocol, which focuses communications only to a limited number of relevant caches, may be desirable in larger, multiprocessor machines.
While the above principals guide the system designer in selecting between snooping and directory protocols, the decision can be complicated. First, many multiprocessor units are designed to accommodate a range of different processor numbers. Selecting one of a directory protocol or a snooping protocol will result in less than optimal performance when the same system is configured with different numbers of processors or in certain upgrade operations where more processors are added to the system.
Second, even for a fixed number of processors, the application being executed may result in a radically different demand on the cache protocol communication network for which one of the snooping or directory protocols will be preferable to the other protocol. For any given system, the amount of memory traffic may vary significantly over time.
What is needed is a cache coherence protocol that works better with these varying real-world conditions.