1. Field of the Invention
The present invention relates to a reduction processing method for a parallel computer in which a plurality of computer nodes are connected via a network, wherein calculation results of the plurality of computer nodes are summarized and reduced, and to a parallel computer, and more particularly relates to a reduction processing method for a parallel computer for performing the reduction processing efficiently using a plurality of network adapters installed in each computer node, and to a parallel computer.
2. Description of the Related Art
As higher speeds of processing are demanded for computer systems, a parallel computer system, where a plurality of nodes including computers are installed and this plurality of nodes are connected via a network, is being provided. For example, in the field of parallel computers, data is computed in parallel by a plurality of nodes and the processed data is exchanged via a network. Such a parallel computer is comprised of several hundred to several thousand nodes if the scale becomes large.
In a parallel computer, the data of a plurality of nodes is collected and the specified operation is executed. This is called “reduction processing”. Examples of reduction processing are an operation to determine the sum of the data of all the nodes, and an operation to determine a maximum value or minimum value of the data of all the nodes.
FIG. 11 and FIG. 12 are diagrams depicting the reduction processing of a conventional parallel computer. As FIG. 11 shows, a plurality of (4 in this case) nodes 100, 101, 102 and 103 are connected via a network, which is not illustrated. Each node 100, 101, 102 and 103 has a plurality (3 in this case) of network adapters 110A, 110B and 110C to enable parallel transfer. In FIG. 11, the reference numbers are assigned only to the network adapters of the node 100, but this is the same for other nodes 101, 102 and 103.
In order to collect the data of the plurality of nodes 100-103 and perform the specified operation (e.g. summation) of this configuration, the data of the node 101 is transferred from the node 101 to the node 100, the data of the node 103 is transferred from the node 103 to the node 102. And the node 100 executes the operation for the data of the node 100 and node 101, and the node 102 executes the operation for the data of the node 102 and node 103 respectively. Then the operation result of the node 102 is transferred to the node 100, and the node 100 executes the operation for the data of the node 100 and the data of the node 102.
This will be described using the example of the 12 blocks shown in FIG. 12. In FIG. 12, D0, D1, D2 and D3 are the data which each node 100, 101, 102 and 103 has, and D01 and D23 are the operation result of the data of the nodes 100 and 101, and the operation result of the data of nodes 102 and 103 respectively, and D0123 is the operation result of the nodes 100-103.
As FIG. 12 shows, 12 blocks of data which each node 100-103 has are divided into 3, and the data operation results D1-01 and D3-O3 are transferred from the node 101 and 103 to the nodes 100 and 102 using three network adapters 110A, 110B and 110C. For the nodes 100 and 102, the data operation results D01-O01 of the nodes 100 and 101, and the data operation results D23-023 of the nodes 102 and 103 are operated.
Then the operation results D23-O23 are transferred from the node 102 to the node 100. The node 100 operates the operation results D0123-O0123 of the nodes 100-103 from the operation results D01-O01 and the operation results D23-O23.
In this way, according to the prior art, each of the nodes 100, 101, 102 and 103 has a plurality of network adapters, so that the time required for reduction processing is decreased by parallel transfer (e.g. Japanese Patent Application Laid-Open No. 2001-325239).
However with the prior art, two nodes execute the operation after the first transfer, and the other two nodes only transfer data and do not execute the operation. Therefore the nodes which execute the operation are limited, and increasing the speed of reduction processing is difficult.