1. Field of the Invention
This invention generally relates to mechanisms for performing the addition of numbers stored in "floating point" form. In particular, this invention relates to methods and circuitry for performing floating point addition with minimum loss of precision.
2. Background Information
For adding two floating point numbers, the smaller number must be right-shifted until the exponents of both numbers are equal. The part of the smaller operand not considered during the addition and which after having been shifted out of the predetermined word length may be referred to as remainder is generally either discarded or considered by rounding the least-significant sum digit. This is described, for example, in IBM TDB October 1969, page 683 and IBM TDB October 1984, pages 3138 to 3140.
When several summands are successively added, as, for instance, in vector operations, such rounding or discarded remainders may lead to substantial errors. Relatively small operands, for example, may be erased, although they would have been decisive to the subtraction of two operands of substantially the same magnitude. In practice, the result of a sum thus obtained depends on the arbitrary sequence of additions or subtractions.
For improving the accuracy, it is known to use multiple word lengths or to temporarily extend the mantissa of the floating point sum at the least-significant end by one digit.
From EP-No. B1 79 471 an arrangement is known which permits obtaining an accurate sum of floating point operands. This arrangement uses a hyperlong accumulator with several hundred positions in the usual exponent range (e.g., .+-.63). In this arrangement, the individual summands are summed in fixed point notation, as one accumulator register is provided for each exponent subarea. Thus, if the exponent matches, the mantissa sum may be stored in the associated accumulator register. The method used in that arrangement necessitates a vast number of carry operations which are the more difficult to implement the longer the accumulator becomes. Such carry operations are highly time-consuming for several hundred positions. The fixed point notation employed also necessitates additional shift operations for matching the sum to the respective accumulator register. For very large exponents, as are provided in future computing standards, the known arrangement could only be realized at extraordinarily high expenditure.
An algorithm ensuring that the approximation for a sum of floating point numbers is as accurate as possible is described in the article by G. Bohlender "Genaue Summation von Gleitkommazahlen" in "Computing", Suppl. 1, 1977, pp. 21 to 32. According to this algorithm, the remainders produced by the individual summations are stored. After all operands have been summed, the remainders are added to the operand sum in a large number of successive cycles. The described algorithm is aimed at obtaining an accurately rounded floating point number for the sum. The described method has the disadvantage that it requires much storage space for the remainders and a large number of operations in which the stored remainders are added to the existing sum.
Therefore it is the object of the present invention to provide a method and an arrangement by means of which an accurate result of a sum of floating point numbers may be obtained without elaborate technical means.