The present invention relates to a multi-processor system having a plurality of processors in order to realize high performance.
A tightly coupled multi-processor system is a high performance computer system in which a plurality of processors share one main storage. In such system, a private cache is provided for each processor in order to reduce contention of access to the shared main storage. Use of these caches poses a problem of cache consistency control. A conventional multi-processor system is introduced in JP-A-4-328653 (Reference document 1). This reference document 1 discloses an invention which uses both an interconnection network and a modified snooping bus including address buses and control buses without data buses. Specifically, although cache consistency is controlled by hardware using an address of a modified snooping bus and a command, similar to conventional techniques, a cache block is transferred between a cache and the main storage or between caches for execution of consistency control via the interconnection network. With this method, the consistency control operation other than data transfer for each memory access can be executed in about one cycle by using an address and a command. Therefore, for a plurality of memory accesses, the consistency control operation other than cache block transfer can be executed sequentially for each cache.
Cache block transfer for consistency control operation for each memory access requires a plurality of cycles. However, different cache blocks can be transferred in parallel between a cache and the main storage or between caches, by using the interconnection network. As compared to the size of an address, the size of a cache block is generally large. Therefore, although address transfer is completed in one cycle, cache block transfer requires plural cycles. For example, assuming that cache block transfer requires eight cycles, cache block transfer requires eight cycles for one cycle address transfer. As described earlier, while an address is transferred via a bus, a cache block is transferred via a crossbar switch in parallel. This can improve the system performance considerably as compared to a conventional snoop system which transfers both the address and cache block via buses. This invention of the reference document 1 describes that various networks such as crossbar switches can be used as the interconnection network.
Another known technique is a so-called memory mapped I/O system. Various control registers, data registers, or the like of this system are mapped to an address space same as that of a main storage, and are accessed by a command of the same format as that of a memory access command used for access to the main storage. For conventional shared memory type multi-processor systems, this memory mapped I/O system is widely used. The reference document 1 does not describe memory mapped registers of the multi-processor system using the interconnection network disclosed therein, and their access method.
According to the invention described in the reference document 1, a plurality of memory accesses can be performed nearly in parallel by utilizing parallel data transfer via the interconnection network. With this method, however, there is a problem that the total number of processors connectable to the modified snooping buses is restricted considerably because of bottle neck of the buses.
An access request to a memory mapped register is preferably transferred to the unit containing the register via the interconnection network in order to simplify the system configuration. However, in order to identify the unit containing a memory mapped register assigned with the address designated by an access request, address allocation information indicating a distribution range of addresses assigned to all memory mapped registers contained in each unit, is required to be stored in advance in the system, and a circuit is required which identifies the unit containing a memory mapped register assigned with the address designated by an access request in accordance with the address designated by the access request and the stored address allocation information.
Memory mapped registers of the system include those in input/output devices connected to input/output units. The number and locations of input/output devices of the system are subject to change. Each time such change occurs, the address allocation information is required to be changed. Therefore, the circuit for identifying the unit containing a memory mapped register designated by an access request is required to deal with such address change, and the structure of the circuit becomes complicated.
Also in this system, if a plurality of processor units, memory units, and input/output units are connected by a single bus, there is only one path for data transfer between connected units. Therefore, this bus becomes a bottle neck in improving the system performance by increasing the number of processors.
In order to solve this problem, a method of using a bus for address transfer and a crossbar switch for data transfer has been proposed by James O. Nicholson, "The RISC System/6000 SMP System", COMPCON95 Proceedings, March 1995, pp. 102 to 109.
With this conventional method, although bus bottle neck in terms of throughput can be solved, it is not possible to increase the number of processors too greatly because of electrical constraints on signal transfer at high frequencies.
In order to solve this, an address is also transferred via the crossbar switch and each unit is connected to the crossbar switch in one-to-one correspondence.
In this case, in order to correctly run software of bus connection, data coherency between a cache memory and a main memory is required to be maintained even under crossbar switch connection. In order to connect a processor of bus connection to the crossbar switch, an address snoop method is required to be realized on the crossbar switch, the address snoop method being generally used as a method of maintaining data coherency of bus connection.
The address snoop method maintains data coherency between a cache memory and a main memory. With this method, it is not necessary to transfer an address to the memory units having no transfer data and to the units having no cache memory. Therefore, address transfer necessary for maintaining data coherency is performed only for the units required to participate in address snoop. The efficient address snoop method can therefore be realized by providing the crossbar switch with means for executing multi-cast of one-to-multi data transfer.
The invention has been made in order to solve the above problems. A first object of the invention is to provide a multi-processor system allowing a memory access derived from a cache to be monitored by another cache by using an interconnection network.
A second object of the invention is to provide a multi-processor system capable of such monitor without adversely affecting input/output units or the like not containing caches.
A third object of the invention is to provide a multi-processor system capable of simplifying a circuit for determining a transmission destination to a memory mapped register designated by an access request.
A fourth object of the invention is to provide a multi-processor system capable of simplifying a circuit for transferring to an input/output device an access request to a memory mapped register contained in the input/output device.
A fifth object of the invention is to provide a multi-processor system capable of simplifying a circuit for transferring to an input/output device an access request to a memory mapped register contained in the input/output device even if the number and combination of input/output devices are changed.
A sixth object of the invention is to solve a bus bottle neck by connecting bus-connected processors to a crossbar switch and improving the performance of a multi-processor system by increasing the number of connectable processors.
A seventh object of the invention is to make a system connected to a crossbar switch be operable without modifying software of a bus connected system.
An eighth object of the invention is to provide an efficient address snooping scheme for a multi-processor system connected by a crossbar switch.
A ninth object of the invention is to provide a multi-cast scheme allowing a flexible system configuration and capable of connecting desired units including processor units, memory units, and input/output units, to a crossbar switch.