Conventionally, a multiprocessor system includes CPUs (Central Processing Units) as a plurality of arithmetic device unit. Moreover, crossbars as data transfer controlling apparatuses each of which is a switching mechanism for quickly and efficiently making a data transfer between devices, for example, between CPUs, between a CPU and a memory, and the like, are used in the multiprocessor system.
FIGS. 1 and 2 are explanatory views of a broadcast transfer made in a multiprocessor system 100.
The multiprocessor system 100 illustrated in FIGS. 1 and 2 include system boards 00 to 15, and crossbars XB 00 and 01.
The multiprocessor system 100 illustrated in FIGS. 1 and 2 illustrate a case where the system boards 00 to 15 respectively configure one node. Nodes are the independent unit of processing implemented by a processor group composed of one or two or more CPUs, and an SC (System Controller) or the like as a memory controller.
The nodes 00 to 07 are connected to a crossbar XB 00, and make a data transfer among the nodes via the crossbar XB 00. Similarly, the nodes 08 to 15 are connected to a crossbar XB 01, and make a data transfer among the nodes via the crossbar XB 01.
Additionally, the crossbar XB 00 and the crossbar XB 01 are communicatively connected. An arbitrary node subordinate to the crossbar XB 00 can make a data transfer to an arbitrary node subordinate to the crossbar XB 01.
Arrows illustrated in FIG. 1 indicate a flow of a process executed when a CPU within the node 00 issues a request of a broadcast transfer to all the CPUs within the system.
(1) The CPU within the node 00 issues only one request of the broadcast transfer to the SC 00 that controls the node 00. The CPU that has issued the request of the broadcast transfer is hereinafter referred to as a requester.
(2) Then, the SC 00 transmits the request to the CPUs other than the requester within the node 00, and the SCs of the nodes 01 to 15.
(3) The SCs of the nodes 01 to 15, which have received the request from the SC 00, respectively transmit the request to the CPUs of the local nodes themselves.
In contrast, arrows illustrated in FIG. 2 indicate a flow of a process of a response to the request of the broadcast transfer made, which is made by the node 00.
(1) The CPUs other than the requester within the node 00 transmit a response to the received request of the broadcast transfer to the SC 00.
(2) Additionally, the CPUs within the nodes 01 to 15 respectively transmit a response to the received request of the broadcast transfer to the SC of the local nodes.
(3) Then, the SCs 01 to 07 respectively transmit responses to the SC 00 of the node 00 via the crossbar XB 00.
(4) Additionally, the SCs 08 to 15 respectively transmit a response to the SC 00 of the node 00 via the crossbars XB 01 and XB 00.
(5) The SC 00 notifies the requester of reception results upon receipt of the responses from all the CPUs.
Related to the above described technique, a method for making a data transfer among processors, which enables a broadcast transfer among threads even if the plurality of threads are assigned to the same node is known.
Additionally, also a broadcast communication method for avoiding a deadlock of broadcast messages within a switch-type network in a parallel computer where a plurality of processors are connected by the network is known.
Furthermore, a monitor data collection method for collecting monitor data in a switch-type network at low cost in real time in a parallel computer where a plurality of nodes are connected by the network is known.
Still further, a packet transfer controlling apparatus that can reduce the length of time needed to detect a timeout while preventing the performance of a communication system from being degraded is known.    Patent Document 1: Japanese Laid-open Patent Publication No. 10-097512    Patent Document 2: Japanese Laid-open Patent Publication No. 10-254843    Patent Document 3: Japanese Laid-open Patent Publication No. 11-232236    Patent Document 4: Japanese Laid-open Patent Publication No. 2005-020394