An accounting system for processing accounts in banks and some type of scientific computation may require numerical error to be small. To this end, multiple-precision numerical representation or variable-precision numerical representation may be employed. In such a case, a single integer may express a sign and an exponent. Further, a digit string separate from the sign and the exponent expressed by the single integer may often be used to represent a mantissa. When such numerical representation is employed, integer calculation may often be utilized to implement arithmetic operation between numerical values.
In contrast, study has been underway on a method for implementing multiple-precision or variable-precision floating-point arithmetic by use of fixed-precision floating-point arithmetic. A hardware processing unit is often available for fixed-precision floating-point arithmetic. The use of such a hardware processing unit can improve processing speed compared to the case in which all processes are performed by software. For example, there is a library that performs multiple-precision binary floating-point arithmetic by use of double-precision floating-point arithmetic. In such a method, a single number is represented by a set of fixed-precision floating-point numbers, which may be referred to as an “unvalued sum” because the set is used as it is, without adding up the individual numbers. Arithmetic operation between different sets may be performed to implement a high-precision arithmetic operation (i.e., four arithmetic operations).
In fixed-precision floating-point arithmetic, the result of arithmetic operations may not always be accurate. For example, providing an accurate sum of two fixed-precision floating-point numbers may require that the result of arithmetic be expressed by two fixed-precision floating-point numbers. An algorithm to obtain such an accurate sum is expressed as follows.two_sum(X,Y)z=fl(X+Y)w=fl(z−X)v=fl(z−w)z1=fl(Y−w)z2=fl(v−X)zz=fl(z1−z2)return(z,zz)Here, fl(X+Y) indicates the result obtained by mapping the true value of X+Y onto the floating-point number, i.e., the result obtained by expressing this value within the limited precision of the floating-point number. Two values z and zz obtained by the above-noted two_sum function accurately satisfies the following: z+zz=X+Y. Value z represents the most significant part of X+Y within the precision of the fixed-precision floating-point number format, and zz represents a remainder that is left unexpressed by the precision of the fixed-precision floating-point number format.
Attending to rounding that occurs at the time of mapping will be described by taking as an example the rounding to nearest with ties away from 0, which is an exemplary rounding method used in decimal numbers. For the sake of simplicity, the precision of a fixed-precision floating-point number is assumed to be two decimal digits. In this case, the sum of 20000 and −1 will be calculated by two_sum as follows.X=20000Y=−1z=fl(X+Y)=20000w=fl(z−X)=0v=fl(z−w)=20000z1=fl(Y−w)=−1z2=fl(v−X)=0zz=fl(z1−z2)=−1Through the above-noted calculations, values z and zz are obtained such as to accurately satisfy the following: z+zz=X+Y.
In the above-noted algorithm, floating-point arithmetic operations are performed as many as six times in order to obtain a single sum “z+zz”. This algorithm for obtaining an accurate sum is frequently used in order to implement multiple-precision or variable-precision floating-point arithmetic by use of fixed-precision floating-point arithmetic. Accordingly, it may be desired to speed up this accurate-sum algorithm.