The present invention relates to a data processing system preferable for managing a copy status of a data block stored in a cache memory and maintaining coherency of the data block status, and the apparatus and control method thereof.
In a computer system, a processor includes a cache memory in many cases in order to quickly respond to an access request made by a processor to access a memory device, and to reduce communication traffic in an interconnecting network. The processor sequentially loads a copy of codes and data having a fixed size (hereinafter generically referred to a "data block") from the memory device in a cache memory associated with the processor, and processing is proceeded by utilizing the copy.
In a case where a parallel computer system includes a plurality of cache memories, it is necessary to maintain coherency of the data block. Generally, the status of each copied data block in the cache memory is monitored, and upon receiving an access request, operation/action is performed in correspondence with the copied status of the data block, whereby maintaining the coherency. One of the protocols for managing the status of each data block is the MESI protocol.
According to the MESI protocol, the status in which a data block can be copied is classified into the following four states:
M: MODIFIED (EXCLUSIVE.sub.-- DIRTY STATE:
an exclusive state where the subject data block is not shared with another cache memory, and contents thereof are different from that of the original data block); PA1 an exclusive state where the subject data block is not shared with another cache memory, and contents thereof coincide with that of the original data block); PA1 a shared state where the subject data block is shared with a plurality of cache memories, and contents thereof coincide with that of the original data block); and PA1 a state where the subject data block is shared with another cache memory in the same node; PA1 a state where the subject data block is shared with a cache memory in another node; and PA1 a state where the subject data block is shared with another cache memory in the same node as well as with a cache memory in another node. PA1 (1) issue a data-invalidation transaction in the first network; PA1 (2) transfer the data-invalidation transaction to a node which has a directory entry corresponding to the data block subjected to the transaction; PA1 (3) check the directory entry and transfer the data-invalidation transaction to a node which is caching the copy of the subject data block; PA1 (4) with respect to each node, issue the data-invalidation transaction to the first network; PA1 (5) return acknowledgment, indicative of completion of the data-invalidation transaction, to a node having the subject directory entry; PA1 (6) notify completion of the data-invalidation transaction to the node which has issued the data-invalidation transaction, upon receiving all acknowledgment; and PA1 (7) notify completion of the data-invalidation transaction to the cache memory which has issued the data-invalidation transaction, resulting in termination of the data-invalidation transaction. PA1 (1) issue a data-invalidation transaction to the first network; and PA1 (2) terminate the data-invalidation transaction.
E: EXCLUSIVE (EXCLUSIVE.sub.-- CLEAN STATE:
S: SHARED (SHARED.sub.-- CLEAN STATE:
I: INVALID (INVALID STATE).
For instance, in a case where a data block in the S (shared) state is write-hit (hereinafter, the state at which a write-access hits a data block cached in a cache memory will be referred to as "write-hit"), copies of the subject data block existing in other cache memories are invalidated and writing is executed. As a result, the subject data is transferred from the S state to the M state. In a case where a read-access to a copy of the data block in the M state is issued in an interconnecting network, operation, e.g. executing the access service, is performed by utilizing the copy of the subject data block, whereby maintaining the coherency of the data block.
A cluster-type parallel computer system is available as a method of constructing a parallel computer system. In the cluster-type parallel computer system, a plurality of nodes are interconnected via a second network. In each node, more than one processors, a cache memory associated with each of the processors, a memory and the like are interconnected via a first network. The cluster-type parallel computer system often adopts a memory form called a distributed shared memory. This is a form where a part or all of the memory device in each node is regarded logically as one main memory of the entire system. More specifically, a processor in an arbitrary node is able to directly access a data block in a memory device of an arbitrary node, and also each cache memory associated with a processor of an arbitrary node is able to store a copy of a data block in a memory device of an arbitrary node.
In the system such as the cluster-type parallel computer system having the second network where cache memories are not directly connected, in other words, in the system where a cache memory controller is unable to monitor all transactions taking place in the interconnecting network, the coherency in the cache is generally maintained by the directory based method. According to this method, a memory medium (or content) called a directory is provided to store and manage caching information for each memory block having a fixed size, and in a case where a transaction which requires coherency-maintain operation is issued, the transaction is notified to a cache memory which is caching the subject memory block. Accordingly, coherency in the cache is maintained.
In a case of a cache memory in the cluster-type parallel computer system, a copy of a stored data block may be classified also into an exclusive state or a shared state. However, the copy status of the data block in the cluster-type parallel computer system is different from the conventional status. More specifically, the SHARED state is further classified into the following:
As described above, the shared state is finely classified in the cluster-type parallel computer system, and the cost required for maintaining coherency is different in each state. Therefore, adopting the general cache memory management mechanism may result in decline in system capability.
For instance, in a case where a cache memory adopting a general method is used, the above-described three types of shared states are managed as one SHARED state.
Herein, the following operation is performed when a write-access is issued and the write-access hits a copy of a data block in the SHARED state:
The foregoing steps are the required processing in a case where a data block is in the shared state with a cache memory in another node. However, in a case where the data block is shared with a cache memory in the same node, only the following two steps are required.
As set forth above, if the general cache memory is used in the cluster-type computer system, unnecessary processes are generated, resulting in decline in the system capability.