This application is related to, and hereby incorporates by reference, the following U.S. patent applications:
Multiprocessor Cache Coherence System And Method in Which Processor Nodes And Input/output Nodes Are Equal Participants, Ser. No. 09/878,984, filed Jun. 11, 2001;
Scalable Multiprocessor System And Cache Coherence Method, Ser. No. 09/878,982, filed Jun. 11, 2001;
System and Method for Daisy Chaining Cache Invalidation Requests in a Shared-memory Multiprocessor System, Ser. No. 09/878,985, filed Jun. 11, 2001;
Cache Coherence Protocol Engine And Method For Processing Memory Transaction in Distinct Address Subsets During Interleaved Time Periods in a Multiprocessor System, Ser. No. 09/878,983, filed Jun. 11, 2001;
System And Method For Generating Cache Coherence Directory Entries And Error Correction Codes in a Multiprocessor System, Ser. No. 09/972,477, filed Oct. 5, 2001, which claims priority on U.S. provisional patent application No. 60/238,330, filed Oct. 5, 2000, which is also hereby incorporated by reference in its entirety.
The present invention relates generally to multiprocessor computer system, and particularly to a multiprocessor system designed to be highly scalable, using efficient cache coherence logic and methodologies that support cache-state transitions from invalid to dirty.
Large-scale multiprocessor systems typically use a directory-based cache coherence protocol. A directory is a cache coherence protocol data structure that tracks which nodes in a given large-scale multiprocessor system are caching a memory line of information maintained in the system. This information is used by the cache coherence protocol to invalidate cached copies of a memory line of information when the contents of the memory line of information are modified.
Many cache coherence protocol operations require an update of a corresponding directory entry, because they affect the cached copies of a memory line of information that is the subject of the operation. Furthermore, since a particular cache coherence protocol operation may affect many nodes, the operation must be executed in distributed fashion while still updating the directory entry correctly.
One way of achieving this effect is to xe2x80x9clockxe2x80x9d a directory entry while an operation is in progress. The directory entry is unlocked after an acknowledgment message has been received from each node affected by the operation. For example, when a number of cached copies of a memory line of information are to be invalidated, the home node locks the directory entry, sends an invalidation message to each node caching a copy of the memory line of information, and updates and unlocks the directory entry only when acknowledgment messages have been received from each of these nodes. While the directory entry is locked, the home node blocks any other operations on the corresponding memory line of information.
More aggressive cache coherence protocols attempt to minimize the locking of directory entries by xe2x80x9coptimisticallyxe2x80x9d updating a directory entry with the expected results of an operation and leaving the directory entry unlocked. The directory entry may, therefore, have to be adjusted when the home node determines that an operation did not occur exactly as planned. In addition, nodes must deal with the problem that request messages for two consecutive operations of a memory line of information may be received out-of-order. In particular, a node O must handle the case of an xe2x80x9cearly-requestxe2x80x9d race, in which node O requests a copy of a memory line of information, but receives a request to perform an operation on the memory line of information before it has received the copy of the memory line of information. This problem can arise because the home node updates the corresponding directory entry to indicate that node O has an exclusive copy of the memory line of information at about the same time that the home node sends the copy of the memory line of information to node O and, therefore, before node O has actually received the copy of the memory line of information. It is possible that the home node could subsequently process a request concerning the same memory line of information and send a related message that is received by node O before the copy of the memory line of information.
There is needed in the art, therefore, a system for dealing with the above described early request race when a cache coherence protocol supports an xe2x80x9cinvalid-to-dirtyxe2x80x9d memory transaction. The xe2x80x9cinvalid-to-dirtyxe2x80x9d memory transaction includes a request to a home node for exclusive ownership on a memory line of information but not the contents of the memory line of information. This request is useful when the requester intends to write the entire contents of the line and therefore does not need the contents of the memory line of information.
The present invention is a system for supporting invalid-to-dirty memory transactions in an aggressive cache coherence protocol that minimizes directory entry locking. More specifically, the nodes of a multiprocessor system each include a protocol engine that is configured to implement a distinct invalidation request that corresponds to an invalid-to-dirty memory transaction. If node O receives this distinct invalidation request while waiting for a response to an outstanding request for exclusive ownership, the protocol engine of node O is configured to treats the distinct invalidation request as applying to the memory line of information that is the subject of the outstanding request for exclusive ownership. Furthermore, if the node O receives a normal invalidation request, while it has an outstanding exclusive request, the invalidation request applies to a previous copy of the memory line of information held by the node N and therefore the protocol engine of node O is configured to ignore the normal invalidation request in this circumstance.