A parallel computing device including a plurality of nodes for executing an arithmetic process is conventionally known. The nodes included in the parallel computing device are interconnected via a network including a plurality of communication appliances, such as switches or the like, and mutually perform a data communication.
FIG. 1 illustrates a configuration example of a parallel computing device 100. The parallel computing device 100 illustrated in FIG. 1 includes nodes N0-N7 for respectively performing a calculation, and switches 110-117 for transferring received data to a node at a specified destination. The switches 110-117 configure a one-dimensional mesh type network where the switches are linearly arranged.
Upon receipt of a packet via an input port to which a node or another switch is connected, a switch decides an output port to output the packet according to a destination of the received packet. Then, the switch outputs the packet to the decided output port.
When a switch receives a plurality of packets to be output to the same output port, the switch performs arbitration such that the numbers of packets to be respectively output from input ports to the output port become equal. Then, the switch transmits the packets via the output port according to a result of the arbitration.
Related to the above described technique, a wireless multi-hop network including a transmitting node for deciding a packet size according to the number of hops, which is the number of transfers up to a destination, and for fragmenting data to be transmitted into packets of a smaller size, and a relay node for transmitting the packets with a priority control according to the number of hops is known. (For example, Japanese Laid-open Patent Publication No. 2003-273788)
Additionally, a wireless data communication method for performing an efficient communication in all cases by deciding a length of a next packet based on a length of a packet that has been communicated is known. (For example, Japanese Laid-open Patent Publication No. 2001-326648)
When a communication is performed and concentrates on some of the switches in the above described parallel computing device 100, a communication bandwidth of a node having a larger number of hops up to a switch on which the communication concentrates becomes narrower. In this case, an arrival of data transmitted from a node having a large number of hops to a destination node is later than that of data transmitted from a node having a small number of hops to the destination node.
FIG. 2 illustrates an example of a collective communication with which the nodes N0-N6 transmit data to the node N7. Each of switches performs arbitration for input packets so that the numbers of the packets that are respectively input to input ports and output to an output port become equal, namely, 1/2 respectively. Accordingly, the number of packets that are transmitted from the node N6 to the switch 116 and further transmitted from the switch 116 to the switch 117 results in 1/2 of the total number of packets that are transmitted from the switch 116 to the switch 117.
In FIG. 2, a ratio of the number of packets transmitted from an arbitrary node to the number of packets transmitted to the node N7, namely, the number of packets transmitted to the switch 117 is called “packet number ratio”. In this case, a packet number ratio of the node N6 is 1/2.
Additionally, the number of packets that are transmitted from the switch 115 to the switch 116 and further transmitted from the switch 116 to the switch 117 results in 1/2 of the total number of packets transmitted from the switch 116 to the switch 117. Moreover, the number of packets that are transmitted from the node N5 to the switch 115 and further transmitted from the switch 115 to the switch 116 results in 1/2 of the total number of packets transmitted from the switch 115 to the switch 116. Accordingly, a packet number ratio of the node N5 is 1/4.
Similarly, packet number ratios of the nodes N4, N3, N2, and N1 are 1/8, 1/16, 1/32, and 1/64, respectively. Moreover, the switch 110 transmits, to the switch 111, only packets transmitted from the node N0. Therefore, a packet number ratio of the node N0 is 1/64 as well as the node N1.
Here, a ratio of a size of packets transmitted from each of the nodes, which is a source of a collective communication, is referred to as a “packet size ratio”. In the collective communication illustrated in FIG. 2, all the nodes output packets of the same size. Therefore, the packet size ratio of the nodes N6, N5, N4, N3, N2, N1, and N0 is 1:1:1:1:1:1:1.
Additionally, a ratio of a communication bandwidth that each of the source nodes uses to transmit packets to the entire communication bandwidth is referred to as a “communication bandwidth ratio”. When all the nodes output packets of the same size, packet number ratios of the nodes result in communication bandwidth ratios unchanged. Therefore, the communication bandwidth ratios of the nodes N6, N5, N4, N3, N2, N1, and N0 are respectively 1/2, 1/4, 1/8, 1/16, 1/32, 1/64, and 1/64 when the entire communication bandwidth is assumed to be 1.
In the example of the collective communication illustrated in FIG. 2, the communication bandwidths of the nodes N0 and N1 far from the destination node N7 become significantly narrow. The collective communication is not complete if the communication of all the nodes is incomplete. Accordingly, an arrival of data transmitted from the node N0 or N1 to the destination node N7 is later than that of data transmitted from the node N6, which is close to the destination node N7, to the destination node N7. In this case, the communication bandwidths of the nodes N0 and N1 far from the destination node N7 cause a bottleneck. Namely, an arrival of data transmitted from a node having a large number of hops to the destination node is delayed.
Additionally, even when a communication concentrates on some of the switches in the communication other than a collective communication, a communication bandwidth of a node having a large number of hops, which is the number of transfers up to a switch on which the communication concentrates, becomes significantly narrow.
FIG. 3 illustrates an example of a case where communication bandwidths of some of the nodes become significantly narrow in a communication other than a collective communication. FIG. 3 illustrates the case where the nodes N0, N1, N2, and N3 perform a communication with each corresponding node separate by four hops at the same time. For ease of understanding of communication paths, the communication paths among the nodes are represented with solid lines marked with arrows. However, a configuration of the parallel computing device 100 is substantially the same as that of FIG. 1.
The number of packets that are transmitted from the node N3 to the switch 113 and further transmitted from the switch 113 to the switch 114 results in 1/2 of the total number of packets output from the switch 113 to the switch 114.
In FIG. 3, a ratio of the number of packets transmitted from an arbitrary node to the number of packets transmitted to the switch 113 is called “packet number ratio”. In this case, the packet number ratio of the node N3 is 1/2.
Additionally, the number of packets that are transmitted from the switch 112 to the switch 113 and further transmitted from the switch 113 to the switch 114 results in 1/2 of the number of packets transmitted from the switch 113 to the switch 114. Moreover, the number of packets that are transmitted from the node N2 to the switch 112 and further transmitted from the switch 112 to the switch 113 results in 1/2 of the number of packets transmitted from the switch 112 to the switch 113. Accordingly, a packet number ratio of the node N2 is 1/4.
Similarly, a packet number ratio of the node N1 is 1/8. Moreover, the switch 110 transmits, to the switch 111, only packets transmitted from the node N0. Therefore, a packet number ratio of the node N0 results in 1/8 as well as the node N1.
When the nodes N0 to N3 output packets of the same size, the packet size ratio of the nodes N0, N1, N2, and N3 is 1:1:1:1. In this case, the packet number ratios result in communication bandwidth ratios unchanged. Therefore, the communication bandwidth ratios of the node N0, N1, N2, and N3 are respectively 1/2, 1/4, 1/8, and 1/8 when the entire communication bandwidth is assumed to be 1.
In the example of the communication illustrated in FIG. 3, the communication bandwidths of the nodes N0 and N1 become much narrower than those of the other nodes. In this case, the communication of the other nodes that execute a process depending on, for example, whether the communication of the node N0 or N1 is complete is not complete if the communication of the node N0 or N1, which is a node having a large number of hops up to the switch 114 on which the communications concentrates, is incomplete. Accordingly, the communication bandwidths of the nodes N0 and N1 cause a bottleneck.
FIGS. 1-3 have referred to the case of the one-dimensional mesh type network where the switches are linearly arranged. A similar problem occurs also in a multi-dimensional mesh type network. Moreover, the above described problem occurs in a case where the parallel computing device 100 has a network form other than a mesh type network.
FIG. 4 illustrates a configuration example of a parallel computing device 400 having a network form of a torus type. The parallel computing device 400 illustrated in FIG. 4 includes the nodes N0-N7 for performing a calculation, and switches 410-417 for transferring received data to a specified destination. The switches 410-417 configure a torus type network in the shape of a ring.
In FIG. 4, the node N4 performs a communication with the node N7 via a path including the switches 414, 415, 416, and 417. The node N5 performs a communication with the node N7 via a path including the switches 415, 416, and 417. The node N6 performs a communication with the node N7 via a path including the switches 416 and 417. Moreover, the node N3 performs a communication with the node N7 via a path including the switches 413, 412, 411, 410, and 417. The node N2 performs a communication with the node N7 via a path including the switches 412, 411, 410, and 417. The node N1 performs a communication with the node N7 via a path including the switches 411, 410, and 417. The node N0 performs a communication with the node N7 via a path including the switches 410 and 417.
The number of packets that are transmitted from the switch 416 to the switch 417 and further transmitted from the switch 417 to the node N7 results in 1/2 of the number of packets transmitted from the switch 417 to the node N7. Moreover, the number of packets that are transmitted from the node N6 to the switch 416 and further transmitted from the switch 416 to the switch 417 results in 1/2 of the number of packets transmitted from the switch 416 to the switch 417.
In FIG. 4, a ratio of the number of packets transmitted from an arbitrary node to the number of packets transmitted to the node N7 is called “packet number ratio”. In this case, a packet number ratio of the node N6 is 1/4.
Additionally, the number of packets that are transmitted from the switch 415 to the switch 416 and further transmitted from the switch 416 to the switch 417 results in 1/2 of the number of packets transmitted from the switch 416 to the switch 417. Moreover, the number of packets that are transmitted from the node N5 to the switch 415 and further transmitted from the switch 415 to the switch 416 results in 1/2 of the number of packets transmitted from the switch 415 to the switch 416. In this case, a packet number ratio of the node N5 is 1/8.
The switch 414 outputs, to the switch 415, only packets input from the node N4. Therefore, a packet number ratio of the node N4 results in 1/8 as well as the node N5. Similarly, packet number ratios of the nodes N0, N1, N2, and N3 are 1/4, 1/8, 1/16, and 1/16, respectively.
When the nodes N0-N6 output packets of the same size, packet size ratio of the nodes N0-N6 is 1:1:1:1:1:1:1. In this case, the packet number ratios result in communication bandwidth ratios unchanged. Therefore, the communication bandwidth ratios of the nodes N0, N1, N2, N3, N4, N5, and N6 are 1/4, 1/8, 1/16, 1/16, 1/8, 1/8, and 1/4, respectively.
In the example of the collective communication illustrated in FIG. 4, the communication bandwidths of the nodes N2 and N3, which are nodes having a large number of hops up to the node N7, become much narrower than those of the other nodes. Therefore, the collective communication is not complete if the communication of all the nodes is incomplete. Therefore, the communication bandwidths of the nodes N2 and N3 cause a bottleneck. Namely, an arrival of data transmitted from a node having a large number of hops to a destination node is delayed.
FIG. 4 has referred to the case of the torus type network where the nodes are connected in the shape of a ring. However, a similar problem occurs also in a case of a multi-dimensional torus type network.
FIG. 5 illustrates a configuration example of a parallel computing device 500 in a case where a network form is of a fat-tree type. The parallel computing device 500 illustrated in FIG. 5 includes the nodes N0-N7 for performing a calculation, and switches 510 to 514 for transferring received data to a specified destination. The switches 510 to 514 are connected in the shape of a fat tree. Here, the fat-tree type is a connection form of a tree type where connections of switches are symmetrically branched from a higher-level switch to lower-level switches to which nodes are respectively connected.
FIG. 5 illustrates an example of a collective communication where the nodes N0-N6 transmit data to the node N7. The number of packets that are transmitted from the node N6 to the switch 514 and further transmitted from the switch 514 to the node N7 results in 1/2 of the number of packets transmitted from the switch 514 to the node N7.
In FIG. 5, a ratio of the number of packets transmitted from an arbitrary node to the number of packets transmitted to the node N7 is called “packet number ratio”. In this case, a packet number ratio of the node N6 is 1/2.
The switch 510 transmits, to the switch 514, packets transmitted from the switches 511, 512, and 513. Accordingly, for example, the number of packets that are transmitted from the switch 513 to the switch 510 and further transmitted from the switch 510 to the switch 514 results in 1/3 of the total number of packets transmitted from the switch 510 to the switch 514.
Additionally, the switch 513 transmits packets transmitted from the nodes N4 and N5 to the switch 510. Accordingly, the number of packets that are transmitted from the node N4 to the switch 513 and further transmitted from the switch 513 to the switch 510 results in 1/2 of the total number of packets transmitted from the switch 513 to the switch 510. Similarly, the number of packets that are transmitted from the node N5 to the switch 513 and further transmitted from the switch 513 to the switch 510 results in 1/2 of the total number of packets transmitted from the switch 513 to the switch 510. Accordingly, packet number ratios of the nodes N4 and N5 are 1/12, respectively.
When the nodes N0-N6 output packets of the same size, packet size ratios of the nodes N0-N6 are 1, respectively. In this case, the packet number ratios result in communication bandwidth ratios unchanged. Therefore, the communication bandwidth ratios of the nodes N0, N1, N2, N3, N4, N5, and N6 are 1/12, 1/12, 1/12, 1/12, 1/12, 1/12, and 1/2, respectively.
In the example of the collective communication illustrated in FIG. 5, the communication bandwidths of the nodes N0-N5 become much narrower than that of the node N6. Accordingly, the communication bandwidths of the nodes N0-N5 cause a bottleneck. Therefore, an arrival of data transmitted from a node having a large number of hops to a destination node is delayed.
As described above, when a communication such as a collective communication concentrates on some of switches, a communication bandwidth of a node having a large number of hops up to a switch on which the communication concentrates becomes significantly narrow. Therefore, an arrival of data transmitted from a node having a large number of hops to a destination is delayed. Accordingly, arrival times of data transmitted to a destination node are not equalized depending on whether the number of hops is either large or small.