The present invention relates to integrated circuits, and more particularly to integrated circuits for performing arithmetic operations.
Many microprocessors such as the popular Intel X86 series and their clones contain apparatus for computing exponential and trigonometric functions. Such functions are useful in diverse areas of engineering, science, and mathematics, as well as in computer graphics applications. Using previously known methods, such functions may take a long time to compute. For example, the computation of sine and cosine functions can take approximately 100 clock cycles for some operands on the Intel Pentium processor and more than 300 clock cycles on the Intel 486.1 
This information came from a document entitled xe2x80x9cEverything You Always Wanted to Know about Math Coprocessors,xe2x80x9d published on the Internet by Norbert Juffa. 
We will now reference various known types of circuits for computing exponential and trigonometric functions.
1. Table lookup method. The pure table lookup method involves keeping a table of the values of the function to be computed for every possible argument x. This approach was seriously considered for 16 bits of precision, but is no longer feasible for higher precision processors such as the high-end processors of today. Table-lookups are now only feasible for approximating such functions, as disclosed in U.S. Pat. No. 5,224,064, entitled xe2x80x9cTranscendental Function Approximation Apparatus and Method,xe2x80x9d to M. Henry and G. Martin.
2. Polynomial approximation method. For example one could compute ex by using the first xe2x80x9cfewxe2x80x9d terms of the infinite series       e    x    =            ∑              i        =        0            ∞        ⁢                            x          i                          i          !                    .      
This series converge quickly for small values of x but converges too slowly for larger values of x. There are other series than power series that may lead to better convergence. However, it is not clear how to speed up such a procedure further. For example, it is not clear how to combine several iterations into one.
3. Combined method. It is possible to combine the two aforementioned methods. Tang, for example, (P. T. P. Tang, xe2x80x9cTable-lookup algorithms for elementary functions and their error analysis,xe2x80x9d Proc. 10th Symp. Computer Arith.) designed such a method, which is used in the Intel Pentium processor. However, like the previous method, it is not clear how to speed up such a procedure further.
4. Method of rational approximation. This method is efficient, but requires a very fast divider, which is expensive.
5. Digit-by-digit methods. This class of methods, which includes the new one discussed in this document, is a very commonly used class of methods for hardware evaluation of exponential, trigonometric, and other transcendental functions. The methods in this class are based on simple iterative equations that imply only addition/subtraction and shift operations. Simple as the methods may be, they traditionally suffer from slow linear convergence.
These iterative methods were first discovered by Volder (J. E. Volder, xe2x80x9cThe CORDIC Trigonometric Computing Technique,xe2x80x9d IRE Trans. Electronic Computers,xe2x80x9d Vol. 8, pp. 330-334, 1959). Recent references include a U.S. Patent (Nakayama, U.S. Pat. No. 4,956,799, Sep. 11, 1990) where the inventor called these iterative methods xe2x80x9cpseudo-division.xe2x80x9d
The principles behind these iterative methods for computing transcendental functions will now be described.
The basic, well-known method for computing ex for x xcex5 [0, ln 2) involves 2 recurrences, as follows:
xi+1=xixe2x88x92ln bixe2x80x83xe2x80x83(1)
yi+1=yibixe2x80x83xe2x80x83(2)
Here x0 is the operand, x, where x can be limited to the range [0, ln 2) because any computation of ex where x is not in this range can be reduced to a computation of ex where x is in this range. (Israel Koren""s book entitled Computer Arithmetic Algorithms, Prentice-Hall, 1993, explains this point as well as the entire traditional algorithm.)
We iterate according to (1) until xn=0 for some n. We then have xn=0=x0xe2x88x92xcexa3i=1nxe2x88x921 ln bi, that is, x0=ln xcfx80i=0nxe2x88x921 bi. Hence it follows that xcfx80i=0nxe2x88x921 bi=ex0, which is ex. Solving the recurrence (2) for yn yields yn=y0 xcfx80i=0nxe2x88x921 bi=y0ex, which is just slightly more general than ex. Thus the task of computing ex is reduced to the task of finding a sequence {bi} and a number n such that xcexa3i=0nxe2x88x921 ln bi=x and then computing y0 xcfx80i=0nxe2x88x921 bi. In order to be useful, the bi""s must not merely exist, but it must also be easy to compute a product by each bi. It turns out that if bi=1+si2xe2x88x92i, where si=0 or 1 would satisfy the aforementioned properties.
The remaining thing to consider is how to choose si for each i to guarantee convergence of the xi""s to zero. It turns out that we can pick si by trial subtraction: First try picking si to be 1, yielding xi+1=xixe2x88x92ln(1+2xe2x88x92i). If xi+1xe2x89xa70, then the choice of si is correct.
But otherwise, choose si=0 instead, yielding xi+1=xi. This process is akin to that of bit-by-bit division, hence the name xe2x80x9cpseudodivisionxe2x80x9d for these iterations.
It is not apparent how to combine several such iterations into 1 step. Ercegovac (M. Ercegovac, xe2x80x9cRadix-16 Evaluation of Certain Elementary Functions,xe2x80x9d IEEE Trans. Comput., C-22: 561-566 (1973)) showed how to compute the logarithm and exponential functions in radix-16. However, each iteration is done at a fairly high cost, and it is not clear how Ercegovac""s scheme can be adapted to the computation of sine and cosine.
Accordingly, notwithstanding the abovementioned methods used in machinery for computing exponential and trigonometric functions, there continues to be a need for new types of machinery for computing such functions that are fast and not too large.
It is therefore the object of the present invention to provide circuits for computing exponential and trigonometric functions at high speed and reasonable cost.
This and other objects of the invention are provided by a circuit that uses a novel computational method wherein eight radix-2 iterations (which we will also call xe2x80x9clogical iterationsxe2x80x9d) are combined into one larger iteration (which we will also call xe2x80x9cphysical iterationsxe2x80x9d). In each of the logical (that is, radix- 2) iterations, only low-precision (and therefore very fast) adders are used, causing temporary error to accumulate. After each physical (larger) iteration (comprising 8 logical (smaller) iterations) is completed, fast and complete correction of the aforementioned temporary error is performed. After eight physical iterations and corrections, all 64 smaller iterations would therefore be completed quickly and without error, that is, with no more error than if we were to simply perform only the smaller iterations in the first place.