1. Field of the Invention
The present invention relates to a data transfer system, a switch, and a data transfer method, and more particularly, to a technique that executes a data transfer from one processor to another processor included in a plurality of processors.
2. Description of Related Art
A Fat Tree is disclosed in “Fat-Trees: Universal Networks for Hardware-Efficient Supercomputing, C. E. Leiserson, IEEE Transactions on Computers, Vol. C-34, No. 10, October 1985” (hereafter, referred to as a “Non Patent Literature”) as an inter-processor network of a parallel computer. As an example of the Fat Tree, a sixteen-input sixteen-output Fat Tree network including a four-input four-output switch 1501 is shown in FIG. 19. Note that, FIG. 19 is drawn by the inventor of the present invention and is not a figure of related art. The same applies to FIG. 20 to FIG. 24. A signal line 1502 between the switches 1501 denotes a two-way link. Sixteen processors 1503 from a processor 0 to a processor 15 are connected in the Fat Tree. Note that, in FIG. 19, references for each of the switches, the processors and the signal lines are omitted except one of each of them.
FIG. 20 shows an example of routing in the Fat Tree. When a packet climbs upward on the Fat Tree, the switch 1501 executes routing according to an output port opposed to an input port. In the Fat Tree, the packet climbs to a common switch between a source processor and a destination processor, and then turns back and climbs down. For example, in communication from a processor 4 (1601) to a processor 15 (1602), a switch A (1603) is the common switch. Therefore, the packet climbs to the switch A (1603) and turns back. In communication from a processor 0 (1604) to a processor 3 (1605), a switch B (1606) is the common switch. Therefore, the packet climbs to the switch B (1606) and turns back. The common switch varies depending on the routing which is executed by each of the switches when the packet climbs the Fat Tree. However, how far (how many stages) the packet climbs is decided depending on the source processor and the destination processor, and does not vary depending on the routing.
Accordingly, when communicating with the destination processor, how many switches the packet goes through varies depending on the source processor. For example, when the packet is sent to the processor 15 (1602), the packet which is sent from a processor 14 (1607) arrives at the processor 15 (1602) by way of only one switch C (1608). The packet which is sent from a processor 12 (1609) and a processor 13 (1610) arrive at the processor 15 (1602) via three switches. When the packet is sent from any of the processors 0 to 11, the packet arrives at the processor 15 (1602) via five switches. Note that, the routing of the packet climbing the Fat Tree always uses the output port opposed to the input port. Therefore, when the packet climbs the Fat Tree, a conflict of the packets does not occur in the switch. When the packet climbs down the Fat Tree, the conflict of the packets occurs.
FIG. 21 shows an example of a configuration of the switch. The switch includes each of FIFO (First In First Out) memories 1710 to 1717 for each of input ports 1702 to 1705 and output ports 1706 to 1709. The FIFO memories 1710 to 1717 are connected to each other through a crossbar switch 1718. When the packets from the plurality of input ports are sent to the same output port, the packets conflict. When the packets conflict, the packet which is input from one of the input ports is selected in the crossbar switch 1718 by an arbitration circuit 1719. The selected packet goes through the crossbar switch 1718, and is written in one of the FIFO memories 1714 to 1717 of the output ports 1706 to 1709. The packet which is not selected waits in the FIFO memories 1710 to 1713 of the input ports until it is selected. Generally, an arbitration algorithm of the arbitration circuit 1719 is created to equally select each of the conflicting packets. That is, it is created to prevent a packet from continuously losing in the conflict arbitration and causing starvation state thereby. Note that, when N number of packets conflict, it is possible to consider a probability of selecting each of the packets in the conflict arbitration as one in N (N is a positive integer of two or more).
Going through one of the switches when the packet climbs down the Fat Tree means that there is a possibility to keep the packet waiting by the conflict arbitration. Since the switch is the four-input four-output switch, there is a possibility that the conflict between the packets which are input from the three input ports occurs when the packet is sent to an output port. Therefore, the possibility that the packet waits by the conflict arbitration is increased with an increase in the number of the switches through which the packet goes when the packet climbs down the Fat Tree.
For example, in a communication from a processor 0 (1801) to a processor 15 (1802) shown in FIG. 22, the conflict occurs in each of a switch D (1803), a switch E (1804), and a switch F (1805). The three packets conflict in each of the switches. Therefore, the probability of selecting each of the packets in the conflict arbitration in each of the switches is one third. Accordingly, the packet which is sent from the processor 0 (1801) to the processor 15 (1802) arrives with a probability of one twenty-seventh without waiting, when the conflict among the three packets including the packet occurs in all of the three switches through which the packet goes.
In a communication from a processor 12 (1901) to a processor 15 (1902) shown in FIG. 23, the conflict occurs in each of a switch E (1903) and a switch F (1904). The three packets conflict in each of the switches. Therefore, the probability of selecting each of the packets in the conflict arbitration in each of the switches is one third. Accordingly, the packet which is sent from the processor 12 (1901) to the processor 15 (1902) arrives with a probability of one ninth without waiting, when the conflict between the three packets including the packet occurs in both of the two switches through which the packet goes.
In a communication from a processor 14 (2001) to a processor 15 (2002) shown in FIG. 24, the conflict occurs in a switch F (2003). The three packets conflict in the switch. Therefore, the probability of selecting each of the packets in the conflict arbitration in the switch is one third. Accordingly, the packet which is sent from the processor 14 (2001) to the processor 15 (2002) arrives with a probability of one third without waiting, when the conflict among the three packets including the packet occurs in the one switch through which the packet goes.
In this manner, in the routing in the Fat Tree, if the location of the source processor varies when sending the packet to a processor, the number of the switches through which the packet goes until the packet arrives varies. In other words, the number of the conflict arbitrations which are executed for the packet until the packet arrives varies, thus the probability that the packet arrives without waiting varies. That is, there is a problem that the packets transferred between processors could vary in their transfer time depending on the location of the processor which sends the packet.
In this manner, if the time until the packet arrives at the destination varies, processing with the use of the packet which takes a long time to arrive at destination becomes a bottleneck. Therefore, there is a problem that a processing delay occurs as a whole computer system.
Note that Japanese Unexamined Patent Application Publication No. 2009-194510 discloses a priority arbitration system which prevents latency of the packet waited by the conflict on the route or the packet via a long route from being decreased. This priority arbitration system is equipped with a plurality of CPUs, a plurality of shared resources, a routing table, and a plurality of crossbars. When sending a request packet to the shared resource, the CPU takes out a latency value corresponding to the destination shared resource from the routing table corresponding to itself, and sets the latency value to the packet header of the request packet. When receiving a plurality of the packets, the crossbar compares the latency values of the received packets, and then preferentially allows the packet having the large latency value to go through the switch.
However, Japanese Unexamined Patent Application Publication No. 2009-194510 does not disclose a technique which decides a selection ratio of receiving conflicting data from each of input ports based on strength information corresponding each of the input ports, when the arbitration is executed.
As described in the related arts, the technique disclosed in the Non Patent Literature has a problem that the packets transferred between processors could vary in their transfer time.