The present invention relates to a multiprocessor system configured with a plurality of processors for realizing a high performance, or in particular to a shared memory multiprocessor for performing the cache coherence control against access requests and a node controller used with the same multiprocessor.
In a well-known method for implementing a shared memory multiprocessor, a plurality of nodes each configured with only processing units having cache memories are connected to each other by a single bus, and further a memory device and an I/O device are connected to the bus. The memory device and the I/O device are shared by the nodes both physically and logically, thereby making up what is called a shared memory multiprocessor. This system comprising a plurality of nodes connected by a single bus is inexpensive and can be configured in a simplistic fashion. In view of the fact that there is only one path for transferring data between the nodes connected to each other, however, the data bus constitutes a bottleneck to what otherwise might be a successful attempt to improve the performance of the system as a whole by increasing the number of nodes.
As a solution to this problem, there has been proposed a method in which a bus is used to transfer an access request (address) for the memory device or the I/O device, while a crossbar switch is used for data transfer.
The 1995 COMPCON95 Proceedings, p.p. 102-109 entitled “RISC System/6000SMP System” (first reference) proposes a system having a physically-shared and logically-shared memory in which a bus is used for address transfer while a crossbar switch is used for data transfer requiring a high throughput.
Generally, a shared memory multiprocessor employing a bus for address transfer uses an address snoop system as a method of maintaining the data coherence between a memory device and the cache memories included in the nodes. In the address snoop system, an address is broadcast in order to maintain the data coherence between all the nodes connected to the bus.
In the system disclosed in the first reference described above, the data throughput can be improved by employing a crossbar switch in place of a bus for data transfer. The use of a single bus for address transfer as in the prior art, however, makes it impossible to realize an efficient address snoop system in keeping with the improved throughput.
In order to obviate the bus neck posed when using a single bus for address transfer, on the other hand, “STARFIRE: extending the SMP Envelop”, 1998 MICRO January/February, pp. 39-49 (second reference) introduces a system which uses multiple buses for address transfer.
The system according to the second reference described above, in which each node is not configured only with a processor having a cache memory, is a multiprocessor system in which each node is configured with a processor including a cache memory, a memory and an I/O device. This system is what is called a distributed shared memory multiprocessor (physically-distributed logically-shared memory multiprocessor), in which the memories and the I/O devices are distributed physically among the nodes but shared logically by the nodes. In the system according to the second reference, a plurality of nodes are coupled to each other by buses for address and coupled by a crossbar switch for data. By use of four address buses, four address snoop operations can be performed in parallel. The physical address space is divided into four parts so that each address bus can snoop different address spaces at the same time.
The use of multiple buses for address transfer as in the second reference makes it possible to realize a more efficient address snoop than when using a single bus.
In the first and second references, however, the bus is used for address transfer and therefore the right to use the address bus is required to be secured even in the case where data coherence is not required between a cache memory and a memory device. Thus, the address bus cannot be used efficiently.
In order to obviate this problem, U.S. Pat. No. 6,011,791 (third reference) discloses what is called a physically-shared logically-shared memory multiprocessor in which the address bus is eliminated and the address is transferred to a crossbar switch for data use. In this system, the address can be transferred only to a node intended as a transfer destination in the case where data coherence is not need between the cache memory and the memory device.