This invention relates to cache coherence control of a computer system, and more particularly to a cache coherence control system for ensuring coherency between caches of a plurality of CPU""s, I/O devices and nodes of a parallel computer system constituted of a plurality of nodes interconnected by an interconnect network.
In a multi CPU system, a plurality of CPU""s perform tasks by accessing a common main memory (shared memory). In such a system, as one CPU executes a main memory access, it becomes necessary to ensure cache coherency of all CPU""s by checking, for example, the existence of modified data in caches of all other CPU""s. This process is called cache coherence control.
In the reference document xe2x80x9cSCALABLE SHARED-MEMORY MULTIPROCESSINGxe2x80x9d, Daniel E. Lenoski et al., published by Morgan Kaufmann Publishers, pp. 16-19, there is a description about a cache coherence control method for a system having a plurality of interconnected CPU""s. With this method, a main memory access issued by one CPU is transferred to all other CPU""s to ensure cache coherency by checking, for example, the existence of modified data in the caches of all CPU""s.
In a multi CPU system according to conventional techniques, in order to ensure cache coherency, it is necessary to broadcast a main memory access of one CPU to all CPU""s and perform a cache coherence control at all CPU""s of the system.
With these conventional techniques, the number of cache coherence control requests received at each CPU increases in proportion to the number of CPU""s.
As the number of CPU""s of a multi CPU system increases, the number of cache coherence control requests received at each CPU increases so that a cache access by each CPU becomes difficult. As the number of cache coherence control requests at each CPU transferred to the network interconnecting all CPU""s of the system increases, the network may be saturated.
As a result, even if the number of CPU""s of the multi CPU system is increased more than a certain number, the system performance cannot be improved.
A parallel computer system has general characteristics that many areas of a memory to be accessed are independent from each process or each thread, and it is rare for the memory to be shared by processes or threads.
Based upon such characteristics, the following approach may be made. A memory area once accessed by a node is assumed to be a memory area which can be accessed exclusively by this node, and when another node accesses this memory area, this memory area is considered as a shared area. In this manner, many memory areas can be accessed exclusively by each node. An access to such memory areas can be executed without cache coherence control so that the number of cache coherence requests can be reduced.
To realize this approach, each memory area is related to each access right. If some node has an access right to a target memory area, this node accesses this memory area without cache coherence control, whereas if the node has no access right, the node accesses this memory area by performing usual cache coherence control.
If a node is to access a memory area whose access right is possessed by another node, the other node with the access right is deprived of the access right. Thereafter, for a memory access to the memory area, the other node executes the cache coherence control.
Generally, a memory is accessed in the unit of a cache block. It is therefore natural to relate an access right to each memory block.
However, with this method, even if some node accesses some block and the access right to this block is given to the node, the number of cache coherence controls cannot be reduced unless the node accesses again this block.
Most of business applications have a low accessibility to the same memory block. Therefore, the above method is associated with the problem that the effects of reducing cache coherence control requests are poor.
It is an object of the present invention to provide a cache coherence control system capable of reducing the number of cache coherence control requests even for a program having a low accessibility to the same memory block.
In order to achieve the above object, according to a representative aspect of the invention, there is provided a cache coherence control system for caches which store, in the unit of a predetermined block, data of a shared memory accessed by a CPU or an I/O device provided at each of a plurality of nodes interconnected by a mutual interconnection network, wherein: in the cache coherence control system, each node has an access right memory for registering an access right entry representative of that the node has an access right to an extended block corresponding to a plurality of blocks of the shared memory; if the node of the CPU or I/O device has an access right to the extended block including a target block for a shared memory access, the target block in the shared memory is accessed without cache coherence control for caches of other nodes; and if the node of the CPU or I/O device has no access right to the extended block including the target block for the shared memory access, cache coherence control for the caches of other nodes are performed and when necessary the shared memory block is accessed.
In the cache coherence control system: the access right memory uses a portion of an address of the extended block as an access right row address, and a portion other than the access right row address of the extended block address of the shared memory as an access right entry tag; and the access right entry tag and one or more of the access right entries for storing the access right status are stored in the access right memory at a same access right row address.
In the cache coherence control system: in accessing the access right memory, the access right memory is read by using the access right row address obtained from the address of the extended block including the target access block of the shared memory; and an access right status of an access right entry among a plurality of access right entries at the access row address, the access right entry having an access right entry tag coincident with the access right entry tag obtained from the extended block address, is used as the access right status of the node.
The access right status includes three statuses: a status with an access right; a status without an access right; and a status with an indefinite access right.
In this case: when the CPU or I/O device issues the shared memory access with cache coherence control; the node of the CPU or I/O device searches the access right memory of the node; if the access right entry corresponding to the extended block including the target access block for the shared memory access does not exist in the access right memory of the node, or if the access right entry corresponding to the extended block exists in the access right memory of the node and the access right status in the access right entry is the status with an indefinite access right; cache coherence control is performed for caches in all other nodes relative to the target access block; an extended block storing status check is performed to check whether one or more blocks in the extended block including the target access block are stored in the cache of each of the other nodes; if the access right entry corresponding to the extended block including the target access block of the shared memory exists and the access right status of the access right is the status without an access right, cache coherence control is performed for the cache of each of all other nodes relative to the target access block; and if the access right entry corresponding to the extended block including the target access block of the shared memory exists and the access right status of the access right is the status with an access right, cache coherence control is not performed for the cache of each of the other nodes.
Further: when a shared memory access with a cache coherence control and extended block storing status check request is received from another node; cache coherence control is performed for the cache of the node relative to the target access block; it is checked whether each block in the extended block including the access target block is stored in the cache of the node, if even one block in the extended block is stored in the cache, it is judged that the extended block is stored in the node, whereas if none of the blocks are stored, it is judged that the extended block is not stored in the node; and the access right memory of the node is searched, and if the access right entry corresponding to the extended block including the target access block exists and the access right status in the access right entry is the status with an access right, the access right status is changed to the status without an access right; or the access right memory of the node is searched, and if the access right entry corresponding to the extended block including the target access block exists and the access right status in the access right entry is the status with an access right, the access right status is changed to the status without an access right if the extended block is stored in the node, whereas the access right status is changed to the status with an indefinite access right if the extended block is not stored in the node.
Further, when a shared memory access with a cache coherence control and extended block storing status check request is received from another node, cache coherence control is performed for the cache of the node relative to the access target block.
The access right status may include two statuses: a status with an access right; and a status with an indefinite access right.
In the cache coherence control system: each node has a block information memory for registering an address of a block stored in the cache of the CPU or I/O device upon issuance of the shared memory access with cache coherence control by the CPU or I/O device; and upon issuance of the shared memory access with cache coherence control by the CPU or I/O device, the address of the target block for the shared memory access is registered in the block information memory of the node of the CPU or I/O device.
In this case: the block information memory uses a portion of an address of the block of the shared memory as a row address, and a portion other than the row address of the block address of the shared memory as an entry tag, the entry tag and one or more of the entries for storing the entry status are stored in the block information memory at a same row address, and the entry status includes two statuses, a status valid and a status invalid; and when the CPU or I/O device issues the shared memory access with cache coherence control, a tag and the status valid obtained from the address of the target block for the shared memory access are registered in the block information memory of the node of the CPU or I/O device at the entry address obtained from the address.
Further: when the shared memory access with cache coherence control is received from another node; cache coherence control is performed for the cache of the node relative to the target block for the shared memory access; if the shared memory access is a write or a read with invalidation, the block information memory of the node is searched, and if an entry corresponding to the target block for the shared memory access is registered in the block information memory and the status of the entry is valid, the status of the entry is changed to invalid; when a shared memory access with a cache coherence control and extended block storing status check request is received from another node; cache coherence control is performed for the cache of the node relative to the target block; and if the shared memory access is a write or a read with invalidation, the block information memory of the node is searched, and if an entry corresponding to the target block is registered in the block information memory and the status of the entry is valid, the status of the entry is changed to invalid and an extended block status check is performed.
Further: when an entry of the block information memory is replaced by another new entry; a block corresponding to the target replace entry is removed from the cache of the node of the block information memory.
Further: when the CPU or I/O device writes back a block in the cache of the CPU or I/O device into the shared memory; the status of the entry corresponding to a target write-back block and stored in the block information memory of the node of the CPU or I/O device is set to invalid.
Further: as an extended block storing status check when a shared memory access with a cache coherence control and extended block storing status check request is received from another node; it is checked whether an address of each block of the extended block including the target block for the shared memory access is registered in the block information memory of the node, and if even one address is registered, it is judged that the extended block is stored in the node, whereas if no address is registered, it is judged that the extended block is not stored in the node.
According to the aspects of the invention described above, the access right memory and access right control circuit are provided in the node control circuit for controlling CPU""s and I/O devices. When a block is accessed which is included in the extended block whose access right is already possessed by its CPU""s or I/O devices, cache coherence control for other nodes can be omitted.
Since the access right management unit is made larger than the block size of CPU, the effects of reducing cache coherence control requests are high even if a business application having a low accessibility to the same memory block is executed.
Since the number of cache coherence control requests can be reduced, it is possible to reduce the traffics of cache coherence transactions flowing on the network and to lower the occurrence frequency of the cache coherence control at each CPU node. Memory access processes larger in number than conventional techniques can therefore be performed and the memory access throughput of the system can be improved. It is therefore possible to obtain a high performance even CPU""s larger in number than conventional techniques are used.
Other aspects of the invention will become apparent from the description of the embodiments to follow.