1. Field of the Invention
The present invention generally relates to floating point processors in high speed digital computers and, more particularly, to an improved method to compute an approximation of the reciprocal of the square root of a floating point number in the IEEE (Institute of Electrical and Electronics Engineers) format.
2. Description of the Prior Art
Floating point arithmetic units are commonly used in central processing units (CPUs) of digital computers, especially high performance superscalar processors such as reduced instruction set computers (RISC). An example of a RISC processor is the IBM PowerPC® used in the IBM RISC System/6000 computer. Such processors typically include an input/output (I/O) unit interfacing with an instruction/data cache and a branch processor, one or more fixed point processors and one or more floating point processors.
The semantics of floating point instructions has not been as clear cut as the semantics of the rest of the instruction set. To address this problem, the computer industry has standardized on the floating point format by IEEE standard 754-1985. The IEEE standard defines 32-bit and 64-bit floating point formats. Each consists of (from left to right) a sign bit, an e-bit exponent and an f-bit fraction. The exponent is assumed to be biased with bias 2(e−1)−1 and it is assumed that an implicit 1 is to be appended to the front (left) of the fraction. The IEEE 32-bit format has e=8 and f=23. The IEEE 64-bit format has e=11 and f=52. The present invention will work for these and similar formats (i.e. with different values for e and f). The IEEE standard also defines the result of arithmetic operations on floating point numbers in these formats.
Of the four basic arithmetic operations of addition, subtraction, multiplication, and division, division is the most difficult to implement efficiently. Further, the operation of division is substantially complicated when dividing by a number which is irrational, as is generally the case when the divisor is a root such as a square root, cube root, fourth root, etc. which cannot be expressed exactly in a number system having a given radix (e.g. decimal, binary, etc.). So-called numerical methods have thus been developed to make certain calculations such that the accuracy of an approximate result after a given amount of calculation can be evaluated. On the other hand, reductions in the cost of memory structures for computers have allowed tables of numbers to be stored and approximations to be made directly therefrom. Of course, the accuracy of any approximation made in such a way varies with the number of table entries since it is often the case that even when a table entry precisely corresponds to a number for which an approximation is desired, the corresponding table entry will be inexact (e.g. due to being an irrational number or a number which cannot be exactly expressed in a given radix) and the approximation will be less accurate between values of table entries with the error increasing with difference of a number from the nearest number for which there is a table entry. Therefore, there is a practical and unavoidable trade-off between accuracy of an estimation taken directly from a table or an approximation derived from a table entry and the number of table entries provided to cover a given range of numerical values.
Square roots and reciprocal square roots are among the types of operations for which approximations are often used. Other roots and reciprocal roots are also often derived by approximation but are encountered less frequently. Between square roots and reciprocal square roots, the latter are generally more useful because they can be used to generate more accurate approximations (to both the square root and reciprocal square root) by using floating point multiply and add/subtract operations while generating more accurate approximations from a square root approximation requires floating point divides which are much slower on current hardware. Also, a square root approximation can be derived from a reciprocal square root simply by multiplying the reciprocal square root approximation by the input. For this reason the RS/6000 architecture alluded to above defines a reciprocal square root approximation instruction and not a square root approximation instruction.
The accuracy of an approximation of the reciprocal of the square root of a number can be improved by computing and adding a correction term to an initial approximation. This means that high accuracy square roots and reciprocal square roots can be (and often are) computed by starting with an initial lower accuracy approximation and refining it. The IBM PowerPC® architecture defines an optional reciprocal square root estimate instruction to assist in the implementation of this approach for computing square roots and reciprocal square roots. The overall efficiency of this approach depends on how rapidly and accurately the initial approximation can be made. Accordingly, the problem of how to compute fast and accurate reciprocal approximations is well-recognized and a variety of hardware and software solutions have been proposed including those disclosed in U.S. Pat. Nos. 5,563,818; 6,163,791 and 6,240,433, as well as the U.S. Patent Application incorporated by reference above.
Since the result of such a calculation is an approximation, all solutions necessarily involve a trade-off between simplicity, speed of execution and accuracy. However, no previously known solution has provided a particularly good balance between these three performance factors. For example, a technique is known for computing an approximation of a reciprocal of a square root of a number beginning with an approximation from a table where the computation will yield a relative accuracy of about 1/2(k+2) where 2k is the number of table entries. However, while the error of such a calculation can, in theory, be made arbitrarily small by increasing k, for modest and practical table sizes, the calculated approximation may not be sufficiently accurate.