1. Field of the Invention
The present invention generally relates to digital computer calculation of reciprocals and square roots and, more particularly, to a method and apparatus implementing reciprocal and square root calculations using Chebyshev polynomial approximation by scaling the mantissas of IEEE floating point numbers based on splitting them into 2.sup.n intervals, thereby allowing Chebyshev polynomials of few terms to approximate each interval and resulting in high performance. In addition, the invention is directed to a method and apparatus distinguishing approximations potentially requiring one bit error correction from those known to be correct.
2. Description of the Prior Art
Workstations are now widely used for solving complex engineering problems. These machines are designed to comply with ANSI/IEEE (American National Standard Institute/Institute of Electrical and Electronics Engineers) Standard 754-1985, "IEEE Standard for Binary Floating-Point Arithmetic", published by IEEE, Inc., New York, August 1985. These machines typically use RISC (for Reduced Instruction Set Computer) technology for increased computational speed. An example of one such workstation is described in IBM RISC System/6000 Technology, IBM product number SA23-2619 (1990).
One of the most common floating-point arithmetic constructs is the dot product. At the heart of the dot product are the multiply and accumulate functions; i.e., A.times.C+B. Many algorithms implemented on workstations also use divides and square roots. On a typical workstation, however, the performance of a double precision multiply exceeds that of a double precision divide or square root by an order of magnitude or more. For example, on the IBM RISC System/6000 workstation, a double precision A.times.C+B operation is completed in two cycles, and since this operation can be pipelined, it may be effectively performed in one cycle. In contrast, a divide takes nineteen cycles and a square root takes fifty-five cycles. Moreover, divides and square roots cannot be pipelined.
While polynomial approximations can boost the performance of square roots and divides, they suffer from two drawbacks which make their implementation a problem:
Intractable Precision Problems--Polynomial approximations, no matter how good, always result in some one bit errors. Since the IEEE 754 floating point standard specifies a result accurate to the least significant bit of a floating point mantissa for both square root and divide, none of these errors is acceptable when trying to meet the 754 standard. Although error detection and correction steps are possible, they usually result in vitiating the performance which was the reason for choosing a polynomial approximation in the first place. PA0 Excessive Storage Requirements--In order to execute quickly, polynomials must have few terms. With just a few terms, a polynomial can only approximate a narrow interval accurately; therefore, the input argument must be split into many intervals, each of which is approximated by a different polynomial. The coefficient storage required for all these polynomials quickly adds up. PA0 Reciprocal PA0 Square Root PA0 1/Square Root PA0 Non pipelined speeds of about 9 cycles PA0 Pipelined speeds of about 4 cycles PA0 ROM sizes of about 225 Kbits per function