1. Field
The present invention generally relates to the design of processors in computer systems. More specifically, the present invention relates to a bandwidth-efficient, directory-based coherence protocol for a shared memory multiprocessor system.
2. Related Art
As shared memory multiprocessor systems increase in size, it is becoming impractical to use broadcast-based cache-coherence techniques because broadcast-based techniques require increasing amounts of inter-processor communication bandwidth as the number of processors increases. This has led to the development of directory-based cache-coherence protocols for larger shared memory multiprocessor systems. Directory-based protocols make use of a directory to maintain information about locations of readable and writable copies of cache lines. This location information allows efficient point-to point communications to be used to ensure coherence, instead of less-efficient broadcast communications.
Directory information is typically partitioned between processor nodes in the multiprocessor system, so that each processor node is responsible for maintaining directory information for cache lines in a subset of the address space in the shared memory multiprocessor system. The responsible node for a specific cache line is referred to as the “home node” for the cache line. The home node for a cache line services requests for the cache line from requesting nodes, wherein a valid copy of the cache line may be located in another processor node, which is referred to as a “slave node.”
In a typical directory-based system, the directory information is not perfect, because it provides a conservative representation of which processors may have a readable or writable copy of a cache line. For example, if a protocol supports silent eviction of shared cache lines, or if the directory uses a coarse bit-mask to indicate sharers, the directory could indicate that processors are potential sharers in situations in which the processors are not actual sharers. In another example, a directory could indicate a potential sharer or owner which is not an actual sharer or owner if there is a race between the cache line being evicted (and, if dirty, being written back) by a processor while that processor is being accessed as a slave during a request for the same cache line by another processor.
In many directory-based coherence protocols (such as in a “blocking” protocol with unordered coherence links) each transaction is terminated by sending an acknowledgment (ACK) message from the requesting node to the home node indicating that the requesting node has received a copy of the cache line. In such protocols, the system delays processing subsequent requests for the same cache line until such an ACK or other message has been received. However, the use of such ACK messages before completing a transaction increases the bandwidth required to maintain coherence. Moreover, the requirement that the home node must wait for such ACK messages increases the time that resources at the home node must be allocated to processing the given transaction. In order to decrease such bandwidth requirements and to improve resource utilization, it is desirable to eliminate the requirement that all transactions end with an ACK (or other message) to the home node.
Unfortunately, eliminating such ACK messages can lead to forward-progress problems. More specifically, if a processor requests access rights to a cache line and then receives a slave message which removes the access rights prior to receiving the requested access rights, the requesting processor may not be able to make forward progress without having to enforce ordering constraints on the coherence links or between virtual channels.
Hence, what is needed is a system which provides a directory-based cache-coherence scheme without the need for such ACK messages.