1. Field of the Invention
The present invention relates to a processing method and computer system for summation of floating point data for computing the sum of floating point data, and more particularly to a processing method and computer system for summation of floating point data for computing the sum of floating point data of a plurality of computer nodes.
2. Description of the Related Art
A parallel computer system where a plurality of nodes including computers are installed and connected by a network has been provided. In such a parallel computer, one job is computed by a plurality of nodes in parallel, and the processing data is exchanged via a network. This parallel computer is comprised of several hundred to several thousand nodes if the scale becomes big.
In such a parallel computer, data of a plurality of nodes are collected, and a specified operation is executed. This is called “reduction processing”. Examples of the reduction processing are an operation to determine the sum of the data of all nodes, and an operation to determine a maximum value or minimum value of the data of all nodes.
A floating point format, in which a numeric value is represented by an exponent and a mantissa, which is one data format handled by a computer, can represent numeric values in a wider range than representation by a fixed point format, in which the position of the decimal point is at a predetermined place. FIG. 19 depicts a floating point format and shows an IEEE standard floating point format.
FIG. 19 shows a 32-bit single precision floating point data and 64-bit double precision floating point data. In both cases the data is comprised of a sign bit, an exponent section and a mantissa section. The sign bit designates a sign of the numeric value, where “1” shows a negative number and “0” shows a positive number. The exponent section indicates an integer value that is of a power of 2, and a mantissa section indicates a value of 1.0 or more and less than 2.0 (normalized number). And the result of the exponent representation multiplied by the mantissa designates an actual numeric value.
In this summation of floating point data, if 3 or more of floating point data is added, the numeric value in the computing result differs depending on the sequence of adding the 3 data. FIG. 20 and FIG. 21 show the summation. Here the values of double precision floating point data are shown in hexadecimal.
As FIG. 20 shows, if floating point data 1, 2, 3 and 4, which consist of an exponent section and a mantissa section, are added in the sequence of data 1, 2, 3 and 4, data 1 and data 2 are added, and this addition result 1 and data 3 are added, then this addition result 2 and data 4 are added.
As FIG. 21 shows, if data is added in the sequence of data 1, 3, 4, and 2, data 1 and data 3 are added, and this addition result 1 and data 4 are added, then this addition result 2 and data 2 are added.
As the numeric examples in FIG. 20 and FIG. 21 show, the addition results of the 4 data differ. This is because the computing result is normalized each time, and canceling of digits is generated in the mantissa section.
In a parallel computer, where one job is executed by a plurality of computers in parallel, the result of parallel execution in progress and the final result may be collected, and the sum thereof may be determined. If the data format in such a case is floating point format, the computing result may be different depending on the computing sequence, which affects the accuracy of parallel computation. Therefore a method for guaranteeing the consistency of the computation result, even if the computing sequence is not adhered to, has been proposed.
FIG. 22 depicts the conventional summation of floating point data, and shows a method for guaranteeing the consistency of a computing result even if the computing sequence is not adhered to.
As FIG. 22 shows, it is effective, in terms of processing efficiency, to install the reduction mechanism, for performing summation of the floating point data of a plurality of nodes, separately from each node. First each node acquires only the exponent section of the floating point data, and instructs the reduction mechanism to determine the maximum value of the exponent section.
The reduction mechanism compares the exponent section data sent from each node, holds only the exponent section having the maximum value, and when the comparison of exponent section data from all the nodes is over, the reduction mechanism returns the exponent section having the highest value to all the nodes.
Each node executes digit matching of the mantissa section according to the exponent section having the highest value returned from the reduction mechanism. And each node instructs the reduction mechanism to determine the sum of the digit-matched mantissa section data.
The reduction mechanism adds the mantissa section data sent from each node, and when the addition of the mantissa section data from all the nodes completes, the reduction mechanism returns the result to all the nodes.
Each node creates the normalized floating point data from the sum of the exponent section data having the highest value and the mantissa section data.
In this way, according to the prior art, digit matching of the mantissa section data is executed by each node, according to the highest value of the exponent section, and the digit-matched data is sent to the reduction mechanism, so the sum can be computed without concern for the computing sequence of the summation (e.g. Japanese Patent Application Laid-Open No. 2005-506596).
In the case of the prior art, however, when a sum of a floating point data is determined, twice the operation, that is, a comparison of the size of exponent sections and the addition of the mantissa sections, are required. Therefore a data exchange between each node and the reduction mechanism is also required twice, which makes the reduction processing time longer. Particularly if the number of nodes increases to several hundred or to several thousand, this increase in the processing time causes interference in increasing the speed of parallel processing.
Also in order to adhere to the computing sequence, a storage circuit for storing the data of all the nodes may be installed in the reduction mechanism, so that the data of all the nodes are received first, then addition is sequentially executed. However an increase in the number of nodes increases the scale of the storage circuit, which increases cost. And starting computation after receiving the data of all the nodes increases processing time. If the number of nodes is increased to several hundred or to several thousand, the circuit scale becomes large, and the length of processing time becomes conspicuous.