In general, a parallel computer system in which a plurality of calculation nodes are connected via an interconnection network is used to quickly process large-scale issues. In the parallel computer system, a target issue is divided for processing by the calculation nodes. In that case, the calculation nodes need to mutually transfer data necessary for computation and data of the computation result. To this end, data transfer from one calculation node to another calculation node is performed via the interconnection network.
When packets containing the same data are transmitted from one calculation node to a plurality of calculation nodes, the intended object can be attained if the calculation node of the transfer source transmits the packets to the calculation nodes at different transfer destinations. However, if the interconnection network has a capability of duplicating packets subject to data transfer and transmitting the packets to the calculation nodes, the load on the interconnection network applied by the calculation node of the transfer source is reduced, so that time required for the data transfer can be reduced. The capability of transmitting the same data from one calculation node to a plurality of calculation nodes is called a multicast communication capability.
In MPI (Message Passing Interface) known as a general method for describing a program to be executed on parallel computers, a set of processes for performing intercommunication is managed as communicators, and functions for performing multicast communication to the processes included in the communicators are prepared.
For example, “MPI_Bcast” defined by MPI is a function for sending data retained by a process to all processes included in the same communicators. The communicators specify a set of the entire processes in the system in some cases and specify a subset of the processes in other cases. “MPI_Allgather” is a function for sending, from one of the processes in a communicator, specified data to all processes included in the communicator.
The capabilities of the functions for performing the multicast communication, such as “MPI_Bcast” and “MPI_Allgather” can also be realized by executing unicast communication, which is one-to-one communication, a plurality of times. However, when the interconnection network has a capability of duplicating the packets and transmitting the packets to the calculation nodes, it is desirable to utilize the capability that the interconnection network has. This is because the data transfer time is reduced and thus the execution performance of the entire program can be improved.
Various techniques for reducing the data transfer time and improving the execution performance of the entire program are proposed.
For example, as disclosed in JP1996-305649A, a multicast method is contemplated in which a buffer for multicast is provided in a general interchange switch, and if no output port is available at the transmission of a multicast packet, a multicast packet to be transmitted is stored in the buffer for multicast until one of the output ports can be used.
Furthermore, a technique is proposed in which data transfer is executed by specifying a computer in a system, which is the transfer destination of data, when multicast communication is performed.
For example, as disclosed in Japanese Patent No. 2581286, a network control method is contemplated in which packets are transferred to a calculation node, the node number of which is within a range that is specified by a minimum value and a maximum value of destination node numbers.
Furthermore, as disclosed in Japanese published translations of PCT international application No. 2004-533035, a class/network path designation method is contemplated in which a message is subjected to multicast transfer to calculation nodes arranged in the same row or to calculation nodes arranged in the same column in a network in which the calculation nodes are arranged in a lattice pattern.
Furthermore, as disclosed in JP2000-216787A, a parallel computer is contemplated in which destinations are encoded into respective packets and described in a fixed-length field on the header. In the parallel computer, the destinations are specified with the small numbers of destination bits, and multicast packets are generated when the data length for specifying the destination exceeds the fixed-length field.
Furthermore, as disclosed in JP 1993-028122A, a broadcast method is contemplated in which a network switch retains path information, and multicast transfer is performed according to the path information. In this broadcast method, an external program calculates the path information in advance, and the calculated path information is set.
Yet furthermore, as disclosed in JP1997-297746A, an interprocessor communication method is contemplated in which address registers that can be set from a program are provided in receiving apparatuses, the headers of packets are used to select the address registers for the reception destinations (destination processors) of the multicast packets, and the addresses of the reception destinations are written to the register values of the selected address registers.
In the multicast communication, transfer destinations and transfer paths corresponding to the transfer destinations need to be specified. In general techniques, there is a problem in which complicated destination patterns may not be able to be described or in which the destination patterns may not be able to be described until the physical location of a calculation node that executes processes is specified.
The general techniques described above suffer from the following problems.
The techniques disclosed in Japanese Patent No. 2581286 and Japanese published translations of PCT international application No. 2004-533035 are problematic in that the since the destination nodes of the multicast communication are closely related to the physical arrangement of the calculation nodes on the interconnection network, the patterns for specifying the destination nodes may be limited.
The technique disclosed in JP2000-216787A is problematic in that when the patterns of the destination nodes of the multicast communication are complicated and are difficult to be encoded to a fixed-length bit sequence, the effects that the multicast capability has may be reduced because a plurality of multicast communications are executed.
The technique disclosed in JP1993-028122A is problematic in that since the path information for path designation is set in advance by an external program, the path cannot be specified when the multicast communication is executed from a user computer, so that it must be determined prior to the execution of the multicast communication on which calculation node the processes are run, thus making a flexible, efficient system operation difficult.
Furthermore, in the techniques disclosed in the above patent documents, when the numbers of the calculation nodes are virtualized or when the calculation nodes for operating the processes cannot be identified in advance when the program is run, destinations of the multicast communication need to be first specified after the physical locations of the calculation nodes are specified, when the multicast communication is executed. Therefore, the techniques disclosed in the patent documents present a common problem in which the time required to specify the path becomes overhead in the program execution.