1. Field of the Invention
The present invention generally relates to a computer system with multiple processors. More preferably, the present invention generally relates to the sharing of data among processors in a Distributed Shared Memory (xe2x80x9cDSMxe2x80x9d) computer system. Still, more particularly, the invention relates to a scalable high performance directory based cache coherence protocol that allows data sharing among processors in a DSM computer system.
2. Background of the Invention
Distributed computer systems typically comprise multiple computers connected to each other by a communications network. In some distributed computer systems, the network computers can access shared data. Such systems are sometimes known as parallel computers. If a larger number of computers are networked, the distributed system is considered to be xe2x80x9cmassivelyxe2x80x9d parallel. One advantage of a massively parallel computer is that it can solve complex computational problems in a reasonable amount of time.
In such systems, the memories of the computers are collectively known as a Distributed Shared Memory (xe2x80x9cDSMxe2x80x9d). It is a problem to ensure that the data stored in a DSM is accessed in a coherent manner. Coherency, in part, means that only one processor can modify any part of the data at any one time, otherwise the state of the system would be nondeterministic.
Recently, DSM systems have been built as a cluster of Symmetric Multiprocessors (xe2x80x9cSMPxe2x80x9d). In SMP systems, shared memory can be implemented efficiently in hardware since the processors are symmetric (e.g., identical in construction and in operation) and operate on a single, shared processor bus. Symmetric Multiprocessor systems have good price/performance ratios with four or eight processors. However, because of the specially designed bus that makes message passing between the processors a bottleneck, it is difficult to scale the size of an SMP system beyond twelve or sixteen processors.
It is desired to construct large-scale DSM systems using processors connected by a network. The goal is to allow processors to efficiently share the memories so that data fetched by one program executed on a first processor from memory attached to a second processor is immediately available to all processors.
Caches connected to each processor of the computer system permit faster access to data from the main memory of each computer system. Caches are useful because they reduce memory latencies on cache hits. However, unique to DSM multiprocessing computer systems, the copies of memory locations stored in each computer system cache allow for inconsistent copies to develop if a coherency protocol that enforces cache consistency is not implemented in the computer system. This coherency protocol must typically be designed in such a manner that it scales to very large processor configurations with maximum memory system performance. Prior art systems suffered from performance bottlenecks due to the bus based cache coherence protocols prevalent in such systems. Bus based coherence protocols limit the number of processors that can be incorporated into such a high performance system. Directory based solutions to the problem of cache and memory coherence scale much better to larger systems because they can be efficiently adapted to more arbitrary and larger numbers of processor interconnects.
The problems noted above are solved in large part by a distributed multiprocessing computer system that contains a plurality of processors, each connected to RAMbus(trademark) Inline Memory Modules (xe2x80x9cRIMMxe2x80x9d) main memory. Thus, each processor preferably has an associated main memory constructed of RIMMs. Each RIMM contains data that is shared between the processors. The main memory is subdivided into logical memory blocks indexed by a physical address used by the processor to access the memory block. Each memory block has an associated directory that maintains the coherence of the data in the memory block across all processors that may contain a copy of the memory block in the distributed multiprocessing computer system. Each memory block in main memory and its associated coherence directory has a designated Home processor. The Home processor contains the original memory block-other processors needing access to the memory block only contain copies of the Home processor memory block. An Owner processor is another processor in the multiprocessing computer system that includes a copy of the Home processor memory block in a cache connected to the Owner processor main memory. Whenever an Owner processor is associated with a memory block, it is the only processor in the distributed multiprocessing computer system permitted to contain a copy of the Home processor memory block. The Owner processor has permission to modify the contents of the memory block.
Each of the processors in the distributed multiprocessing computer system incorporates a coherence controller connected to a RIMM. A coherence controller maintains the coherence of the shared data in the memory module using the coherence directory in the Home processor. If the cache becomes full, a memory block may need to be replaced to make room for a new memory block. Thus, for the case of an Owner that contains a memory block copy, if this memory block copy is replaced from the Owner""s cache memory, then the copy of the Owner memory block is written to the Home processor containing the original memory block. In addition, the corresponding directory entry in the Home processor for the memory block is updated. A read of the Home processor directory or modification of other processor cache and main memory RIMMs in the computer system is not required.
In each processor, the coherence controller sends and receives messages out of order to maintain the coherence of the shared data in the main memory RIMMs. If an out of order message causes an incorrect next program state, the cache and directory coherence controllers restores the prior correct saved program state and resume execution. In the distributed multiprocessing computer system, a processor needing to read or write to a memory block not present in its main memory or cache must then request a copy of the memory block from the Home processor. This processor is referred to as a Requester processor. After the Requester processor by consulting the Home processor has determined the owner of a cache block, Requester and Owner processors communicate directly to maintain cache coherency without routing through the Home processor directory.
A memory block in the distributed multiprocessing computer system also may be shared by multiple processors that have read only access capabilities to the memory block. These Sharer processors each contain a shared copy of the memory block in a cache connected to the Sharer processor main memory. A coherence controller in the processor maintains the coherence of the shared data in the main memory using the coherence directory for the memory block in the Home processor. Each coherence controller is capable of sending and receiving messages out of order to maintain the coherence of the shared data in the main memory. If out of order messages cause an incorrect next program state, the coherence controller is capable of restoring the prior correct saved program state and resume execution.
In the distributed multiprocessing computer system, a Requestor processor that encounters a read or write miss of a memory block in its main memory or cache can send and receive messages directly to Sharer processors to maintain cache coherency, without routing the messages through the Home processor directory. Eviction of the shared copy of the memory block in the Sharer processor cache caused by replacement of the memory block from the cache does not have to be communicated to the Home processor directory for the memory block.