1. Field of the Invention
The present invention generally relates to computer systems, and more particularly to a computer system providing an improved method for generating high precision estimates of the square root, reciprocal square root, and reciprocal functions.
2. Description of the Related Art
A computer processor can perform arithmetic operations on different types of numbers, or operands. For example, the simplest operations involve integer operands, which are represented using a "fixed-point" notation. Non-integers are typically represented according to a "floating-point" notation. In conventional computers, a "single-precision" floating-point number is represented using a 32-bit (one word) field, and a "double-precision" floating-point number is represented using a 64-bit (two-word) field.
Floating-point notation (which is also referred to as exponential notation), can be used to represent both very large and very small numbers. A floating-point notation has three parts, a mantissa (or significant), an exponent, and a sign (positive or negative). The mantissa specifies the digits of the number, and the exponent specifies the magnitude of the number, i.e., the power of the base which is to be multiplied with the mantissa to generate the number. For example, using base 10, the number 28330000 would be represented as 2833E+4 (a mantissa of 2833 and an exponent of 4), and the number 0.054565 would be represented as 54565E-6 (a mantissa of 54565 and an exponent of -6). Since processors use binary values in their computations, floating-point numbers in most computers use 2 as a base (radix). Thus, a floating-point number may generally be expressed in binary terms according to the form EQU n=(-1).sup.S.cndot.1.F.cndot.2.sup.E,
where n is the floating-point number (in base 10), S is the sign of the number (0 for positive or 1 for negative), F is the fractional component of the mantissa (in base 2), and E is the exponent of the radix. In accordance with the Institute of Electrical and Electronics Engineers (IEEE) standard 754, a single-precision floating-point number uses the 32 bits as follows: the first bit indicates the sign (S), the next eight bits indicate the exponent offset by a bias amount of 127 (E+bias), and the last 23 bits indicate the fraction (F). So, for example, the decimal number ten would be represented by the 32-bit value
0 10000010 01000000000000000000000 as this corresponds to (-1).sup.0.cndot.1.01.sub.2.cndot.2.sup.130-127 =1.25.cndot.2.sup.3 =10.
Many processors handle floating-point operations with a floating-point unit (FPU). Floating-point processing is used in addition, multiplication and division operations, and may also be used for other special mathematical functions, such as the square root (x), reciprocal square root (1/x), and reciprocal (1/x) functions. Many conventional processors use iterative approximation algorithms to determine these values, such as Newton-Raphson iteration. To understand this particular algorithm, it is helpful to first understand Newton's iteration for zero finding, which is also used for division operations, and is illustrated in FIG. 1. A generalized curve is represented by the function f(x), and it is desired to know the zero of this function (i.e., the value of the x-intercept). Basically, the function is approximated by its tangent at an arbitrary location (x.sub.0) corresponding to an initial guess, and a new guess is then formulated based on the zero (x.sub.1) of this tangent line. This simple procedure can then be repeated as many times as is required to achieve a desired precision. It can be shown that, generally, the zero of a tangent line corresponding to a given iteration i is given by the equation ##EQU1##
The reciprocal operation can be cast as a zero finding operation by considering the function f(x)=1/.times.-r. Since the zero of this function is at 1/r, Newton's iteration can be used to calculate the reciprocal function. It can be shown that, for the reciprocal function, equation 1 becomes EQU x.sub.i+1 =x.sub.i (2-x.sub.i r). (equ. 2)
This equation is commonly used for division. An initial value for x.sub.0 is provided by a table lookup. Several iterations of equ. 2 are used to obtain the reciprocal of the divisor which corresponds to r. The reciprocal value is then multiplied by the dividend to obtain the quotient.
To compute a square root, a similar iterative process can be performed, and it can be shown that the square root can be computed according to the equation EQU x.sub.i+1 =0.5(x.sub.i +r/x.sub.i). (equ. 3)
However, equ. 3 requires a divide operation for r/x.sub.i, which is much slower than a multiply operation. Instead, the Newton-Raphson iterations for computing the reciprocal square root is often used, which is: EQU x.sub.i+1 =0.5x.sub.i (3-x.sub.i.sup.2 r). (equ. 4)
Several iterations are used to obtain a more precise estimate of the reciprocal square root of r. This value may then be multiplied by r to obtain the square root of r, as shown below:
r=r(1/r).
Newton-Raphson converges quadratically so each iteration approximately doubles the number of correct digits. A table lookup is again used to start the procedure, avoiding the first few iterations and improving performance. To generate four correct bits via table lookup requires an 8-entry table. Each additional bit required doubles the number of table entries.
Newton-Raphson iterative algorithms are used in many cases since they are often faster than other algorithms. Nevertheless, iterative algorithms may present other problems. For example, rounding errors can be introduced, and for division operations, it provides no remainder. Furthermore, iterative algorithms still have performance limitations. Depending upon the precision required, the complete iterative process can take a considerable number of processor cycles. The corresponding delay may affect some procedures, particularly those involving multimedia presentations. Some multimedia extensions to processor architectures also specify reciprocal and reciprocal square root instructions that require increased (12-bit) precision. To generate 12 correct bits using conventional table lookup techniques requires a table of 2048 entries, which introduces more hardware costs. One alternative is to use a 64 entry table and one Newton-Raphson iteration which requires at least two multiply operations, as indicated by equation 2 (for the reciprocal), or requires at least three multiply operations, as indicated by equation 4 (for the reciprocal square root).
Another alternative involves the use of a linear approximation. See, e.g., U.S. Pat. No. 5,563,818, and "Efficient Initial Approximation and Fast Converging Methods for Division and Square Root," Proceedings 12th Symposium on Computer Arithmetic, pp. 2-9 (1995). However, these implementations require larger lookup tables and larger multipliers than the implementation that will be described. A system that performs a quadratic approximation is shown in U.S. Pat. No. 5,245,564. This technique is an extension of linear approximation, providing three times the precision of a table lookup for the same number of words at the cost of an additional multiply and add, but accordingly requiring a longer execution time.
In light of the foregoing, it would be desirable to devise an improved method of estimating square roots, reciprocal square roots, and reciprocals with high precision to reduce the number of iterations. It would further be advantageous if the method could reduce both the size and the delay of the multiplier to achieve higher precision without increasing the time required to compute the reciprocal or reciprocal square root.