In recent years, cluster systems connecting a large number of nodes, for example, servers by a high-speed network have been widely used in the field of HPC (high performance computer). In many cases, such cluster systems perform parallel computation processing. In communication for the parallel computation processing, it is recommended to achieve a low delay and secure a wide band in signal transmission. To this end, InfiniBand-based Fat Tree connections have been widely used, particularly in large-scale cluster systems.
For example, the Fat Tree is a topology having a multiplexed tree-type network configuration such as a configuration illustrated in FIG. 13. FIG. 13 is a diagram illustrating a configuration of a network apparatus having a three-stage Fat Tree. In the Fat Tree, since the number of connection links in each switch is equal in the upper side and the lower side except for the top, a sufficiently wide band can be secured even when transmission is made in either the upward direction or the downward direction.
Herein, switches B1 to B9 illustrated in FIG. 13 will be referred to as first-stage switches. Further, switches M1 to M9 will be referred to as second-stage switches. Further, switches T1 to T9 will be referred to as third-stage switches. In the Fat Tree illustrated in FIG. 13, each of the first-stage switches is connected to three second-stage switches. Further, the switches B1 to B3 are connected to the same second-stage switch. Further, the switches B4 to B6 are connected to the same second-stage switch. Further, the switches B7 to B9 are connected to the same second-stage switch. For example, each of the switches B1 to B3 is connected to the switches M1, M4, and M7.
Further, each of the switches B1 to B9 is connected to three nodes. In FIG. 13, circles connected to the switches B1 to B9 represent nodes. Also, a number marked under each node represents each node number. This node number serves as an address for signal transmission between the nodes. For example, the nodes denoted by node numbers 1 to 3 are connected to the switch B1. Further, the nodes denoted by node numbers 4 to 6 are connected to the switch B2. Hereinafter, the node denoted by a node number P will be referred to as a node P.
Each of the nodes 1 to 27 transmits a signal destined for another node, to the first-stage switch connected to the node itself. Herein, each of the nodes 1 to 27 designates a node number as a destination address of a signal transmitted to another node.
Each of the switches B1 to B9 receives a signal transmitted from the node connected to the switch itself to another node. Then, the switches B1 to B9 transmit signals destined for node numbers of 1 mod 3, to the switches M1 to M3. Further, the switches B1 to B9 transmit signals destined for node numbers of 2 mod 3, to the switches M4 to M6. Further, the switches B1 to B9 transmit signals destined for node numbers of 3(0) mod 3, to the switches M7 to M9. For example, the switch B1 transmits a signal of 1 mod 3 among the signals received from the nodes 1 to 3, to the switch M1 as indicated by an arrow 901. Further, the switch B1 transmits a signal of 2 mod 3 among the signals received from the nodes 1 to 3, to the switch M4 as indicated by an arrow 902. Further, the switch B1 transmits a signal of 3(0) mod 3 among the signals received from the nodes 1 to 3, to the switch M7 as indicated by an arrow 903.
Each of the switches M1 to M9 receives signals that are equal in the remainder of a signal destination node number divided by 3, from the first-stage switches connected thereto. For example, the switch M1 receives a signal having a destination node number of 1 mod 3, from each of the switches B1 to B3. Further, the switch M2 receives a signal having a destination node number of 1 mod 3, from each of the switches B4 to B6. Further, the switch M3 receives a signal having a destination node number of 2 mod 3, from each of the switches B7 to B9. Further, the switch M4 receives a signal having a destination node number of 2 mod 3, from each of the switches B1 to B3. Further, the switch M7 receives a signal having a destination node number of 3(0) mod 3, from each of the switches B1 to B3.
Further, the switches M1 to M3 and the switches T1 to T3 are combined with each other, the switches M4 to M6 and the switches T4 to T6 are combined with each other, and the switches M7 to M9 and the switches T7 to T9 are combined with each other. Then, the switches M1 to M3 transmit signals having a destination node number of 1 mod 9, to the switch T1 as indicated by an arrow 904, for example. Further, the switches M1 to M3 transmit signals having a destination node number of 4 mod 9, to the switch T2 as indicated by an arrow 905, for example. Further, the switches M1 to M3 transmit signals having a destination node number of 7 mod 9, to the switch T3 as indicated by an arrow 906, for example. Further, the switches M4 to M6 transmit signals having a destination node number of 2 mod 9 to the switch T4, transmit signals having a destination node number of 5 mod 9 to the switch T5, and transmit signals having a destination node number of 8 mod 9 to the switch T6. Further, the switches M7 to M9 transmit signals having a destination node number of 3 mod 9 to the switch T7, transmit signals having a destination node number of 6 mod 9 to the switch T8, and transmit signals having a destination node number of 0 mod 9 (=9 mod 9) to the switch T9.
Then, the switches T1 to T9 transmit signals to the second-stage switches connected to the first-stage switches connected to the nodes having the destination node numbers of the signals received from the switches M1 to M9. Further, the switches M1 to M9 transmit signals to the first-stage switches connected to the nodes having the destination node numbers of the signals received from the switches T1 to T9. Thereafter, the switches B1 to B9 transmit signals to the nodes having the destination node numbers of the signals received from the switches M1 to M9.
In this way, a signal is transmitted from a node to another node in the Fat Tree.
Further, a communication scheme called a shift communication pattern is widely used in all-to-all communication that transmits messages from all nodes to all nodes in Fat Tree connection. If the number of nodes is N, the shift communication pattern is configured by N number of communication phases. Then, in the i-th communication phase, each node transmits a message with respect to a node number that is previous by “i” to its own node number. Thus, if each node number is p (p=1, 2, . . . , N), ((i+p) mod N) is a message destination of each node in the i-th communication phase. That is, in the shift communication pattern, the destinations of signals transmitted from the respective nodes do not overlap with each other in each phase.
FIG. 14 is a diagram illustrating a shift communication pattern. FIG. 14 illustrates the 9th communication phase in a case where a shift communication pattern is performed in the same configuration as illustrated in FIG. 13. Numerals enclosed by a box 910 represent the destination node numbers of respective nodes. In the 9th communication phase, the destination node number of each node is equal to 9 plus the node number of each node. In the shift communication pattern, the node numbers of signals received by the first-stage switches B1 to B9 from subordinate nodes are serial numbers. Thus, the first-stage switches receive three signals having destination node numbers of 1 mod 3, 2 mod 3, and 3(0)mod 3. Accordingly, the first-stage switches transmit the received signals to the different second-stage switches. Each of the second-stage switches receives signals that are equal in the remainder of division by 3 among 9 serial destination node numbers. Then, the numbers that are equal in the remainder of division by 3 among 9 serial destination node numbers are different in the remainder of division by 9. For example, the switch M1 receives three signals having destination node numbers of 1 mod 9, 4 mod 9, and 7 mod 9. Accordingly, the switch M1 transmits the received signals to the different third-stage switches. Likewise, each of the switches M2 to M9 transmits the each received signals to the different third-stage switches.
Accordingly, two or more signals may not flow through any path. Hereinafter, when two or more signals flow through the same path, it will be referred to as a path contention. While a description has been given of the 9th communication phase as an example, the relation between the destinations of signals received by each switch is also the same in other communication phases. That is, when the shift communication pattern is used, a path contention does not occur in any communication phase. Accordingly, the use of the shift communication pattern in the Fat Tree can achieve a high throughput in all-to-all communication and can secure a wide band in signal transmission.
In the description of FIGS. 13 and 14, for the convenience in description, it has been described that all signals are transmitted to the destination nodes through the first-stage to third stage switches. However, when the network connection state in each switch is stored and a signal can be transmitted to a destination node even without transmitting the signal to the upper stage, each of the first-stage switches and the second-stage switches directly transmits a signal to the lower-stage switch or a subordinate node. For example, when a node with a node number of 1 transmits a signal to a node with a node number of 2, the first-stage switch directly transmits the signal received from the node with a node number of 1, to the node with a node number of 2.
In the Fat Tree, for example, a second-stage Fat Tree, there has been proposed a conventional technology of the Fat Tree connected to transmit one-hop signals to all other nodes.
Non-patent Literature 1: Using Fat-Trees to Maximize the Number of Processors in a Massively Parallel Computer, M. Valerio, L. E. Moser and P. M. Melliar-Smith, Department of Electrical and Computer Engineering University of California, Santa Barbara
However, in the Fat Tree topology illustrated in FIG. 13, when the scale of a network increases, the average number of hops between nodes increases. Therefore, it is difficult to reduce a delay in signal transmission.
Further, in the Fat Tree connected to transmit one-hop signals to all other nodes, it is difficult to perform all-to-all communication by shift communication, and it is difficult to secure a wide band for signal transmission.