The present invention relates to a multiprocessor system configured with a plurality of processors for realizing a high performance, or in particular to a shared memory multiprocessor for performing the cache coherence control against access requests and a node controller used with the same multiprocessor.
In a well-known method for implementing a shared memory multiprocessor, a plurality of nodes each configured with only processing units having cache memories are connected to each other by a single bus, and further a memory device and an I/O device are connected to the bus. The memory device and the I/O device are shared by the nodes both physically and logically, thereby making up what is called a shared memory multiprocessor. This system comprising a plurality of nodes connected by a single bus is inexpensive and can be configured in a simplistic fashion. In view of the fact that there is only one path for transferring data between the nodes connected to each other, however, the data bus constitutes a bottleneck to what otherwise might be a successful attempt to improve the performance of the system as a whole by increasing the number of nodes.
As a solution to this problem, there has been proposed a method in which a bus is used to transfer an access request (address) for the memory device or the I/O device, while a crossbar switch is used for data transfer.
The 1995 COMPCON95 Proceedings, p.p. 102-109 entitled xe2x80x9cRISC System/6000SMP Systemxe2x80x9d (first reference) proposes a system having a physically-shared and logically-shared memory in which a bus is used for address transfer while a crossbar switch is used for data transfer requiring a high throughput.
Generally, a shared memory multiprocessor employing a bus for address transfer uses an address snoop system as a method of maintaining the data coherence between a memory device and the cache memories included in the nodes. In the address snoop system, an address is broadcast in order to maintain the data coherence between all the nodes connected to the bus.
In the system disclosed in the first reference described above, the data throughput can be improved by employing a crossbar switch in place of a bus for data transfer. The use of a single bus for address transfer as in the prior art, however, makes it impossible to realize an efficient address snoop system in keeping with the improved throughput.
In order to obviate the bus neck posed when using a single bus for address transfer, on the other hand, xe2x80x9cSTARFIRE: extending the SMP Envelopxe2x80x9d, 1998 MICRO January/February, pp. 39-49 (second reference) introduces a system which uses multiple buses for address transfer.
The system according to the second reference described above, in which each node is not configured only with a processor having a cache memory, is a multiprocessor system in which each node is configured with a processor including a cache memory, a memory and an I/O device. This system is what is called a distributed shared memory multiprocessor (physically-distributed logically-shared memory multiprocessor), in which the memories and the I/O devices are distributed physically among the nodes but shared logically by the nodes. In the system according to the second reference, a plurality of nodes are coupled to each other by buses for address and coupled by a crossbar switch for data. By use of four address buses, four address snoop operations can be performed in parallel. The physical address space is divided into four parts so that each address bus can snoop different address spaces at the same time.
The use of multiple buses for address transfer as in the second reference makes it possible to realize a more efficient address snoop than when using a single bus.
In the first and second references, however, the bus is used for address transfer and therefore the right to use the address bus is required to be secured even in the case where data coherence is not required between a cache memory and a memory device. Thus, the address bus cannot be used efficiently.
In order to obviate this problem, U.S. Pat. No. 6,011,791 (third reference) discloses what is called a physically-shared logically-shared memory multiprocessor in which the address bus is eliminated and the address is transferred to a crossbar switch for data use. In this system, the address can be transferred only to a node intended as a transfer destination in the case where data coherence is not need between the cache memory and the memory device.
The use of multiple buses for address transfer as in the second reference can realize the address snoop more efficient than when a single bus is used. In the case where a multiplicity of nodes are involved, however, even the use of multiple buses cannot secure the throughput of the address snoop commensurate with the improved throughput of the data transfer by the crossbar switch.
According to the third reference in which the address bus is disused and the address and the data area transferred through a single crossbar switch, a sufficient throughput of the address snoop cannot be secured in the case where the nodes are increased in number.
In all the conventional systems described above, an address is transferred to all the nodes in the case where data coherence is required between the cache memory and the memory device. According to the second reference, for example, an address is broadcast to all the nodes in the case where data coherence is required.
In view of this, the present inventors have conducted the following study. Specifically, in the case where data coherence is required, the address is required to be transferred only to the nodes having a cache (i.e. the nodes requiring cache coherence control for an access request), but the address transfer is not required to the nodes having no cache (i.e. the nodes requiring no cache coherence control for an access request). In the prior art, however, the address is transferred also to the nodes having no cache, thereby deteriorating the utilization efficiency of the path (regardless of whether the path is a crossbar switch or a bus). In the case where the nodes are increased in number, therefore, a sufficient throughput of the address snoop cannot be secured.
In the case where no data coherence is required between the cache memory and the memory device, the address is required to be transferred only to the nodes to which data coherence is required.
Specifically, the address is required to be transferred only to the nodes requiring data coherence, and therefore means is required for the one-to-many transferring (multicast) as well as the one-to-all transferring (broadcast).
The present inventors have proposed a shared memory multiprocessor system, in which each node is not configured only with processing units including cache memories but includes at least one processing unit each having a cache memory combined with at least one of a memory device and an I/O device, so that a plurality of the nodes have different configurations. Also in this distributed shared memory multiprocessor, the address is required to be transferred only to the nodes requiring cache coherence control for an access request but no address transfer is required to the nodes not requiring cache coherence control for an access request.
Accordingly, an object of the present invention is to provide a distributed shared memory multiprocessor configured with a plurality of different nodes and capable of efficient address snoop.
Another object of the invention is to provide a distributed shared memory multiprocessor configured with a plurality of nodes and capable of efficient address snoop, wherein the address is not transferred to the nodes not requiring coherence (i.e. the nodes not requiring cache coherence control for an access request) regardless of whether data coherence control is required or not between the cache memory and the memory device.
In order to achieve these objects, according to one aspect of the invention, there is provided a shared memory multiprocessor, wherein each node includes a unit for adding to an access request the information indicating whether data coherence (cache coherence control) is required or not and the information on the node intended as a transfer destination, and an inter-node connection network includes a unit which, based on the information added to the access request transferred from the node, transfers an address to all the nodes connected to the inter-node connection network which have a cache (all the nodes requiring cache coherence control for an access request) in the case where data coherence is required, and transfers the address, in one-to-one correspondence, only to the nodes intended as a transfer destination indicated by the node information in the case where data coherence is not required.
As a result, no address is transferred to the nodes not requiring data coherence and an efficient address snoop system is realized. In other words, the unrequited address transfer is eliminated and the effective throughput of the inter-node connection network is improved.
According to an embodiment of the invention, there is provided a shared memory multiprocessor further comprising a unit for transferring an address directly to a unit (memory device or I/O device) in the same node (local node) as the source of an access request without sending it to the inter-node connection network in the case where data coherence is not required between the cache memory and the memory device and the destination of transfer is the particular unit in the local node, based on the information added to the access request. As a result, the unrequited transfer can be eliminated. Also, it is possible to improve the effective throughput of both the inter-node connection network and intra-node paths.
According to another embodiment of the invention, a crossbar switch but not a bus is preferably employed also for address transfer, and the address snoop between the nodes is carried out through the crossbar switch thereby to secure a scaleable throughput of the address snoop commensurate with the data transfer throughput in the crossbar switch connection.
According to still another embodiment of the invention, a crossbar switch but not a bus is preferably employed also for address transfer, and the address path and the data path of each node are configured with an independent crossbar switch.
By connecting the address path and the data path of each node with a crossbar switch, a plurality of address transfers and data transfers can be carried out in parallel as long as the destinations of access are different. Thus, a scaleable throughput of the address snoop commensurate with the data transfer throughput in the crossbar switch connection can be secured.