1. Field of the Invention
The present invention relates generally to calculators, computers, and arithmetical errors, and, more particularly, to an apparatus and method useful for reducing the floating-point error associated with evaluating periodic functions.
2. Description of Related Art
A mathematical operation performed in a computer or similar hardware platform carries with it certain imprecisions. These imprecisions arise from characteristics of the hardware which is used to represent real numbers in a computer. Due to the finite size of memory storage locations, e.g. hardware characteristics, computers approximate non-terminating, real numbers by either truncating or rounding to a computer number having a preselected number of significant digits. These computer numbers are commonly referred to as floating-point numbers. Once the real numbers are represented as floating-point numbers, further imprecisions arise, because arithmetic operations performed by a computer generally involve further truncation or rounding.
As a simple illustration, consider a computer that only stores numbers having two decimal digits in the registers of an arithmetic logic unit (ALU). Such a computer may store the two real numbers 1.2 and 3.4 exactly. Nevertheless, the computer truncates or rounds the number representing the product of these two numbers, e.g., 1.2.times.3.4=4.08, because the registers of the ALU may only store numbers having two digits, and the product has three significant digits. For many arithmetic operations, the computer truncates or rounds. Reducing truncation and rounding errors arising from arithmetic operations is desirable in a computer or similar hardware computing platform.
Now, suppose that one wants to evaluate a periodic function at a particular floating-point value, x, in a computer that can represent floating-point numbers having mantissas with b or less significant digits. For very small values of x, the evaluation of the periodic function is typically straightforward using a power series expansion. For example, the absolutely convergent series: EQU sin (x)=x-x.sup.3 /3!+x.sup.5 /5!+. . . EQU cos (x)=1-x.sup.2 /2!+x.sup.4 /4!+. . .
give values for the sine and cosine functions that can be accurately computed with a relatively small number of terms in a truncated power series when x is small. In evaluating the truncated power series, the absolute error is governed by the magnitude of the first truncated term of the convergent power series. When the value of x is small, a value for the periodic function, with a small relative error, is straightforward to obtain, because the truncation is understood and rounding errors can be controlled by careful computations which use multiple machine precision, if necessary.
As the value of x increases, the problem of evaluating simple functions with their power series expansions often becomes more pronounced. In many cases, the use of a power series expansion becomes impractical, because rounding error contaminates the results or multiple machine precision computations become unmanageable.
For periodic functions, the large input arguments can be reduced, in magnitude, to smaller reduced arguments that allow more manageable evaluations of the functions with truncated series. The smaller reduced arguments are obtained from identities for periodic functions. In general, a periodic function .function.(x) satisfies the relation: EQU .function.(x)=.function.(x+Np),
where p is the period, and N is any integer. For the case of the sine and cosine functions two specific relations are: EQU sin (x)=sin (x+2.pi.N) and cos (x)=sin (x+.pi./2).
In the evaluation of periodic functions on a computer having a fixed machine precision, a fundamental problem is the performance of argument reductions when the argument x is large, and the period p is a non-rational real number. In the case of the sine and cosine functions, the period 2.pi. is such a non-rational real number. If the period p is not rational, the argument reduction is, in itself, approximate when performed in a computer.
In computer argument reductions, the goal is to compute a reduced argument y to the machine's precision from an initial argument x. The reduced argument y satisfies the relation: EQU y=x-Np
For the trigonometric functions, the reduced argument y satisfies: EQU y=x-2.pi.N.
By using the above-mentioned relations between the sine and the cosine, one can also do argument reductions of trigonometric functions by subtracting integer multiples of .pi./2. Then, a reduced argument y satisfies y.di-elect cons.[-.pi./4, +.pi./4] and is defined by: EQU y=x-.pi.N/2.
For this definition of y, cos (x) is given by: EQU cos (y) if N mod 4=0, EQU -sin (y) if N mod 4=1, EQU -cos (y) if N mod 4=2, or EQU +sin (y) if N mod 4=3.
For the same definition of y, sin (x) is given by functions of the above list, wherein N is replaced by N+1. In a computer, only an approximation to .pi./2 may be represented. As the magnitude of x increases, more and more digits of .pi./2 will be involved in the computation of y=x-.pi.N/2 to machine precision. Prior art shortcuts to using higher precision values of .pi. for the reduction of larger arguments x often did not result in an accurate reduced argument.
One prior art method for argument reductions is due to Cody and Waite. The method is described below, but more details may be found a book by Jean-Michel Muller, Elementary Functions Algorithms and Implementation, 148 (1997). This method comprises finding two numbers C.sub.1 and C.sub.2 that are exactly representable in the computer being used. C.sub.1 is very close to the period, p, and is frequently the first few digits of p. For values of N that are not too large, N C.sub.1 is exactly representable in the computer even though the period, e.g., p=C.sub.1 +C.sub.2, is beyond working precision. Then instead of evaluating x-Np, the method evaluates (x-NC.sub.1)-NC.sub.2. If NC.sub.1 is exactly representable, the term in parentheses may be evaluated without any error. Then, the result will be obtained to a larger precision than a direct argument reduction would obtain. A second prior art method due to Payne and Hanek is described in the above-mentioned book. Id. at 154.
Another technique effectively uses floating-point numbers to multiple machine precision. The technique writes .pi. as a sum of several pieces as follows: EQU .pi.=PI.sub.1 +PI.sub.2 +PI.sub.3 +PI.sub.4 . . .
In a computer with 64 digit registers, the pieces PI.sub.1, PI.sub.2, PI.sub.3, each contain 64 binary digits of the binary expression for .pi.. PI.sub.1, PI.sub.2, PI.sub.3, etc. have exponents 2.sup.1, 2.sup.-63, 2.sup.-127 etc. and mantissas between 1 and 2. The truncation to four terms provides an approximation for .pi. to 256 bits. A programmer of ordinary skill could compute, the appropriate integer N, e.g., the nearest integer to x, by evaluating individual terms in: EQU y=x-((N.times.PI.sub.1)/2+(N.times.PI.sub.2)/2+(N.times.PI.sub.3)/2+(N.time s.PI.sub.4)/2)
and adding the results. The reduced argument y may be obtained in a similar way. This technique for performing argument reductions is costly in terms of computer time. A method that can perform argument reductions with fewer bits of .pi., i.e., fewer multiple machine precision calculations, would use costly computer time more efficiently.
FIG. 1 is a flow chart of another prior-art method for evaluating cos (x) in a computer having 80-binary bit memory storage locations. The computer allocates 64-bits of a memory location to a floating-point number's mantissa, 15-bits to the number's exponent, and 1-bit to the number's sign. The method assumes that the original argument x may be represented exactly by a 64-digit mantissa, e.g., a hardware related restriction on the program of FIG. 1, and also that .vertline.x.vertline.&lt;2.sup.63. The precisions of the blocks of FIG. 1 determine whether the result is accurate to machine precision.
Referring to FIG. 1, at block 10 the computer initializes values of variables of the program. At block 12, the computer determines N, the integer part of 2x/.pi.. At block 14, the computer subtracts N.pi./2 from the argument x to obtain a reduced argument y in the range of [-.pi./4, +.pi./4]. At block 16, the computer determines the number of terms of the series expansion of cos (y) (or sin (y) if appropriate) that will give a result to 64-bit, machine precision. At block 18, the computer evaluates the truncated series for cos (y) (or sin (y) if appropriate) in an accurate manner. At block 20, the computer uses the above-mentioned relation between cos (x) and cos (y) (or sin (y) if appropriate) to determine cos (x).
Generally, the relative error of cos (y) or sin (y) is within machine precision, for a machine using mantissas having P binary digits, if the error in the reduced argument y satisfies error.sub.y &lt;2.sup.-(P+3) .vertline.y.vertline.. Here, error.sub.d .ident..vertline.d-d.sub.computer .vertline., i.e. the absolute error of d. The error.sub.y is ordinarily determined by the number of PI's that are employed in the argument reduction of block 12. For the reduced argument to be accurate to a relative error of less than 2.sup.-67, more PI's are used. If the magnitude of the original argument x is close to 2.sup.63, the computer uses .pi. and performs argument reductions to quadruple machine precision through hardware or software implementations. These multiple machine precision methods may be very time intensive and thus, undesirable in high speed computers.
These results clarify some problems with prior art methods for argument reduction. For example, the PENTIUM.TM. PRO processor of INTEL Corporation simply uses a 66-bit value of .pi. to do argument reductions for original arguments up to 2.sup.63 in magnitude. The use of a 66-bit value of .pi. is probably not sufficient to accurately evaluate a trigonometric function with a generic argument of this magnitude.
The present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.