The present invention relates to a multi-processor system having a plurality of processors in order to realize high performance.
A tightly coupled multi-processor system is a high performance computer system in which a plurality of processors share one main storage. In such system, a private cache is provided for each processor in order to reduce contention of access to the shared main storage. Use of these caches poses a problem of cache consistency control. A conventional multi-processor system is introduced in JP-A-4-328653 (Reference document 1). This reference document 1 discloses an invention which uses both an interconnection network and a modified snooping bus including address buses and control buses without data buses. Specifically, although cache consistency is controlled by hardware using an address of a modified snooping bus and a command, similar to conventional techniques, a cache block is transferred between a cache and the main storage or between caches for execution of consistency control via the interconnection network. With this method, the consistency control operation other than data transfer for each memory access can be executed in about one cycle by using an address and a command. Therefore, for a plurality of memory accesses, the consistency control operation other than cache block transfer can be executed sequentially for each cache.
Cache block transfer for consistency control operation for each memory access requires a plurality of cycles. However, different cache blocks can be transferred in parallel between a cache and the main storage or between caches, by using the interconnection network. As compared to the size of an address, the size of a cache block is generally large. Therefore, although address transfer is completed in one cycle, cache block transfer requires plural cycles. For example, assuming that cache block transfer requires eight cycles, cache block transfer requires eight cycles for one cycle address transfer. As described earlier, while an address is transferred via a bus, a cache block is transferred via a crossbar switch in parallel. This can improve the system performance considerably as compared to a conventional snoop system which transfers both the address and cache block via buses. This invention of the reference document 1 describes that various networks such as crossbar switches can be used as the interconnection network.
Another known technique is a so-called memory mapped I/O system. Various control registers, data registers, or the like of this system are mapped to an address space same as that of a main storage, and are accessed by a command of the same format as that of a memory access command used for access to the main storage. For conventional shared memory type multi-processor systems, this memory mapped I/O system is widely used. The reference document 1 does not describe memory mapped registers of the multi-processor system using the interconnection network disclosed therein, and their access method.
According to the invention described in the reference document 1, a plurality of memory accesses can be performed nearly in parallel by utilizing parallel data transfer via the interconnection network. With this method, however, there is a problem that the total number of processors connectable to the modified snooping buses is restricted considerably because of bottle neck of the buses.
An access request to a memory mapped register is preferably transferred to the unit containing the register via the interconnection network in order to simplify the system configuration. However, in order to identify the unit containing a memory mapped register assigned with the address designated by an access request, address allocation information indicating a distribution range of addresses assigned to all memory mapped registers contained in each unit, is required to be stored in advance in the system, and a circuit is required which identifies the unit containing a memory mapped register assigned with the address designated by an access request in accordance with the address designated by the access request and the stored address allocation information.
Memory mapped registers of the system include those in input/output devices connected to input/output units. The number and locations of input/output devices of the system are subject to change. Each time such change occurs, the address allocation information is required to be changed. Therefore, the circuit for identifying the unit containing a memory mapped register designated by an access request is required to deal with such address change, and the structure of the circuit becomes complicated.
Also in this system, if a plurality of processor units, memory units, and input/output units are connected by a single bus, there is only one path for data transfer between connected units. Therefore, this bus becomes a bottle neck in improving the system performance by increasing the number of processors.
In order to solve this problem, a method of using a bus for address transfer and a crossbar switch for data transfer has been proposed by James O. Nicholson, xe2x80x9cThe RISC System/6000 SMP Systemxe2x80x9d, COMPCON95 Proceedings, March 1995, pp. 102 to 109.
With this conventional method, although bus bottle neck in terms of throughput can be solved, it is not possible to increase the number of processors too greatly because of electrical constraints on signal transfer at high frequencies.
In order to solve this, an address is also transferred via the crossbar switch and each unit is connected to the crossbar switch in one-to-one correspondence.
In this case, in order to correctly run software of bus connection, data coherency between a cache memory and a main memory is required to be maintained even under crossbar switch connection. In order to connect a processor of bus connection to the crossbar switch, an address snoop method is required to be realized on the crossbar switch, the address snoop method being generally used as a method of maintaining data coherency of bus connection.
The address snoop method maintains data coherency between a cache memory and a main memory. With this method, it is not necessary to transfer an address to the memory units having no transfer data and to the units having no cache memory. Therefore, address transfer necessary for maintaining data coherency is performed only for the units required to participate in address snoop. The efficient address snoop method can therefore be realized by providing the crossbar switch with means for executing multi-cast of one-to-multi data transfer.
The invention has been made in order to solve the above problems. A first object of the invention is to provide a multi-processor system allowing a memory access derived from a cache to be monitored by another cache by using an interconnection network.
A second object of the invention is to provide a multi-processor system capable of such monitor without adversely affecting input/output units or the like not containing caches.
A third object of the invention is to provide a multi-processor system capable of simplifying a circuit for determining a transmission destination to a memory mapped register designated by an access request.
A fourth object of the invention is to provide a multi-processor system capable of simplifying a circuit for transferring to an input/output device an access request to a memory mapped register contained in the input/output device.
A fifth object of the invention is to provide a multi-processor system capable of simplifying a circuit for transferring to an input/output device an access request to a memory mapped register contained in the input/output device even if the number and combination of input/output devices are changed.
A sixth object of the invention is to solve a bus bottle neck by connecting bus-connected processors to a crossbar switch and improving the performance of a multi-processor system by increasing the number of connectable processors.
A seventh object of the invention is to make a system connected to a crossbar switch be operable without modifying software of a bus connected system.
An eighth object of the invention is to provide an efficient address snooping scheme for a multi-processor system connected by a crossbar switch.
A ninth object of the invention is to provide a multi-cast scheme allowing a flexible system configuration and capable of connecting desired units including processor units, memory units, and input/output units, to a crossbar switch.
In order to achieve the above objects of the invention, a transmission destination determining circuit is provided. If a processor unit issues an access to data in the main memory and the cache of the processor unit does not hit, the transmission destination determining circuit determines, as the transmission destination of the access, a plurality of destinations including one memory unit assigned with the address designated by the access request and all processor units.
A simple circuit is provided for transferring to an input/output device an access request to a memory mapped register. This circuit locally broadcasts to all input/output units the access request to a memory mapped register of an input/output device to all input/output units.
In this invention, a conventional address bus proposed by Nicholson is not used, but the address is passed through the crossbar switch whose port is connected in one-to-one correspondence to each unit. In order to use an address snoop scheme together with the crossbar switch, the crossbar switch is provided with means for broadcast an address to all units connected to the crossbar switch. In transferring an address necessary for maintaining data coherency, the crossbar switch is controlled so that the address is transferred to all units.
The crossbar switch is provided with means for multi-casting an address necessary for maintaining data coherency only to those units required to participate in address snoop. The address necessary for maintaining data coherency is transferred only to those units required to participate in address snoop. The crossbar switch is provided with means for storing information indicating whether the unit connected to each port is associated with multi-cast, and with means for determining a destination port in accordance with the stored information. Multi-cast is performed after setting a multi-cast destination to means for determining a destination port in accordance with the information indicating whether each unit is associated with multi-cast.
As above, each unit is connected to the crossbar switch so that a plurality of data transfers can be executed at the same time if the destination ports are different. Therefore, a system performance can be suppressed from being degraded by bus contention because of an increased number of processors.
Since each unit is connected in one-to-one correspondence to each port of the crossbar switch, a better electric performance can be obtained than a bus connected system. Therefore, the number of connectable processors can be increased.
The address necessary for maintaining data coherency is broadcast so that the address snoop function of conventional processors can be utilized realizing a low cost and efficient method of maintaining data coherency.
The address of a coherent transaction is transferred only to those units required to participate in address snoop, among the units connected to the ports. Therefore, unnecessary data transfers can be eliminated and an effective data transfer throughput can be improved.
Since the information indicating whether each unit at each port is associated with multi-cast is stored, it becomes possible to connect each unit to a desired port, allowing a flexible system configuration.
Although the multi-cast of the invention is described by using address snoop by way of example, the invention, is generally applicable to the case wherein data is transferred to a plurality of ports.
For example, in transferring a reset command to all input/output devices, means is provided for storing information indicating whether each port is connected to an input/output unit and whether each input/output unit is associated with multi-cast.