In a parallel computer and the like, a plurality of computation nodes, that is, electronic computers having a central processing unit (CPU) and a memory and the like are connected to improve performance of an entire system. A network used in an information processing system having such computation nodes may be configured by connecting the computation nodes via switches. As such a switch, a crossbar switch serving as a data transfer device may be used.
The crossbar switch includes one having a buffer in an input port and one having no buffer in the input port due to restriction on amount of materials. In a case where the crossbar switch having no buffer is used, a handshake such as synchronizing data between the input port of the crossbar switch and a source such as a computation node supplying data to the input port is performed. In a case where there are requests to send data from a plurality of computation nodes, a target of the handshake is determined. This process for determining the target of the handshake may be called arbitration in some cases. An example of a handshake system includes a system that transmits an arbitration request from a source to the crossbar switch, the crossbar switch having received the request sends transmission permission to the source, and the source transmits a data packet to the crossbar switch. There is also a system in which a plurality of computation nodes sequentially permit data transmission for an input port for a predetermined period.
In a case where the arbitration request is used, the crossbar switch receives arbitration requests for data from a plurality of computation nodes and performs arbitration processing for determining a computation node of which arbitration request is received. The selected computation node obtains transmission permission with respect to a port that has selected the computation node, and transmits a data packet to a port that has received the transmission permission. As a method for processing the arbitration request in such arbitration processing, a method for once deleting an arbitration request that has not been selected in one port and an arbitration request output from one port to the other port by the selected computation node may be considered. Hereinafter, the arbitration request that has not been selected in one port and the arbitration request output from one port to the other port by the selected computation node are referred to as an “unused arbitration request”. Deleting the unused arbitration request may be referred to as “negating” in some cases. In addition, a method for continuing to send the arbitration request output from one port to the other port by the selected computation node may be exemplified as the other processing method.
In a related art, waiting time until arbitration is created for each port, an arbitration is performed by an arbitration device after the waiting time has elapsed, and a dead cycle of the crossbar switch is suppressed (for example, refer to Japanese Laid-open Patent Publication No. 11-73403). In another related art, a counter is periodically decremented by setting a length of data to the counter after sending a data transfer permission signal, and the next arbitration processing is performed when the counter becomes zero (for example, refer to Japanese Laid-open Patent Publication No. 2001-22711).
In such a system, communication probability may be different depending on a combination of a computation node and a port. In this case, data transmission is performed by connecting the computation node and the port at the same degree in the conventional system. Therefore, it has been difficult to improve efficiency of data transfer processing since an input port having high communication probability and an input port having low communication probability are on a par with each other.
In order to perform processing at high speed corresponding to relative merits of an output port having high communication probability and an output port having low communication probability, it may be considered that a combination of a computation node and an input port having high communication probability is preferentially connected as a group. In this case, a method in which an unused arbitration request is once deleted and each computation node sends a new arbitration request may be used. In this case, there may be a problem as follows.
For example, an example of data transfer in a parallel computer using a combination of which communication probability is high as a group will be described with reference to FIG. 9. FIG. 9 is a diagram for explaining data transfer in a parallel computer in a case where a combination of which communication probability is high is grouped.
A computation node 901 and an output port 912 are grouped, a computation node 902 and an output port 913 are grouped, a computation node 903 and an output port 914 are grouped, and a computation node 904 and an output port 911 are grouped. The computation node 901 transmits arbitration requests 921 to 923 to respective output ports 912 and 914, receives transmission permission 924 from the output port 912, and selects data transfer to the computation node 902 via the output port 912. Thereafter, the computation node 901 deletes arbitration requests to the output ports 913 and 914. In addition, the computation node 901 transmits a new arbitration request to the port 912. In this case, the computation node 901 transmits an arbitration request to the output ports 912 to 914 at the same time after the data transmission to the port 912 is completed. In this case, the output ports 913 and 914 confirm that data transmission corresponding to transmission permission transmitted to the computation nodes 902 and 903 grouped with the output ports 913 and 914 is not performed, and performs arbitration with respect to the arbitration request of the computation node 901. In contrast, the output port 912 may perform arbitration with respect to the arbitration request from the computation node 901 immediately after receiving the request. Therefore, the latency of the output port 912 is smaller than that of the output ports 913 and 914, and the transmission permission may be immediately transmitted to the computation node 901. Therefore, probability that the output port 912 is selected as a data transmission destination in the computation node 901 is higher than that of the ports 913 and 914. As a result, a possibility of data transfer being repeatedly performed using the same combination is increased.
Here, considering a case where the computation node 901 transmits arbitration requests to the output ports 912 to 914 at the same time after the data transmission to the output port 912 is completed, and the output ports 913 and 914 do not transmit transmission permission to computation nodes grouped therewith. In this case, the output port 912 is grouped with a computation node A901, so that latency of the output port 912 is smaller than that of the output ports 913 and 914 and the computation node 901 may immediately start data transmission with respect to the output port 901. Therefore, the output port 912 has a greater likelihood of being selected as a data transmission destination by the computation node 901 than that of the output ports 913 and 914. As a result, the possibility of data transfer being repeatedly performed using the same combination is also increased.
As described above, the data transfer becomes unbalanced in the conventional parallel computer, so that it becomes difficult to perform efficient communication.
In the related art in which the waiting time until the arbitration is created for each port, a case where a combination of which communication probability is high is grouped is not considered. Therefore, there is a high possibility that the grouped computation node and the port exclusively use a bus, and it is difficult to perform efficient communication. Also in the related art in which the arbitration is performed based on a counter to which a length of data is set, a case where a combination of which communication probability is high is grouped is not considered, so that it is also difficult to perform efficient communication.