Barrier synchronization is known as a method of synchronizing a plurality of processes which are executed in parallel with each other. In the barrier synchronization, a barrier point is set. A barrier point is a point at which synchronization is made. A process to perform the barrier synchronization temporarily stops the execution when the execution arrives at the barrier point. The process to perform the barrier synchronization restarts the stopped execution when all the processes subject to the barrier synchronization and executed in parallel arrive at the barrier point. In this manner, synchronization of the parallel execution between the plurality of processes which are executed in parallel can be made.
On the other hand, a reduction operation is known as an arithmetic operation for data held by a plurality of processes. Several reduction operations, for example, an arithmetic operation to calculate a sum of the data and an arithmetic operation to calculate a maximum value and a minimum value, are known. The reduction operation includes an arithmetic operation in which only a specific process has an arithmetic operation result, and an arithmetic operation in which all processes have arithmetic operation results, respectively. However, in any case, since data communication is performed between processes, the reduction operation can be executed by using the same algorithm as that of the barrier synchronization.
A data communication which broadcasts data from a processor to all the other processors, sets up a butterfly barrier in which processors that finally received the broadcast data communicate with each other, and reports from the processors that participated in the butterfly barrier to all the processors that did not participate in the butterfly barrier of termination of data communication, is known.
[Patent Document 1] Japanese Patent Application Laid-Open No. 03-098152
[Patent Document 2] Japanese Patent Application No. 07-152712
During execution of barrier synchronization, depending on algorithms, processes must change destinations of signals indicating that the processes arrive at a barrier point at each stage. Also in the reduction operation, the processes must change destinations of intermediate results of the arithmetic operation at each stage. The inventor of the present invention studied ways to change the destinations at high speed to increase a barrier synchronization speed and a reduction operation speed.
According to the studies, when both barrier synchronization and a reduction operation are realized by software, a destination changing operation on the stages are partially performed by a CPU. For this reason, overhead easily occurs. Therefore, as a result, the speed of the barrier synchronization and the reduction operation cannot be increased.
On the other hand, when the destination changing operation is realized by hardware, the destination changing operation can be performed without the CPU. For this reason, the speed of the barrier synchronization and the reduction operation are expected to be increased. However, depending on configurations of hardware to be employed to the high-speed operation, when a plurality of nodes are connected to each other by a network, a configuration of the network between the nodes may be limited.