The present invention pertains to a method and apparatus for implementing arithmetic functions, and more particularly, to a method and apparatus for implementing floating point functions using memory, such as Read Only Memory (ROM), to assist in mathematical computations.
Hardware devices, such as microprocessors, typically include dedicated circuitry for performing mathematical computations on operands. For example, circuitry may be dedicated for the calculation of the square root of a single operand. In environments such as Reduced Instruction Set Chip (RISC) designs, space may not be available for circuitry dedicated to the performance of certain mathematical computations. One method for reducing the area occupied by circuitry for computing specific functions is to store pre-calculated approximations for functional results over a particular range of values for the operands in a memory such as a ROM on the microprocessor chip. These approximations are then made more precise by a highly optimized software routine such as the one known in the art as the Newton-Raphson algorithm. The Newton-Raphson algorithm is a successive approximation scheme in which each iteration roughly doubles the number of bits of precision of the previous approximation. The more accurate the first approximation, the fewer number of iterations are required to achieve the end result to a desired precision. An example of such a method is described in chapter 8 of Computer Arithmetic Algorithms by Israel Koren (Prentice-Hall, Inc., 1993) and also shown in FIG. 1. The IEEE standard for the representation of normalized floating point numbers includes a mantissa of m+1 bits and an exponent e having a number of bits such that the normalized, floating point representation of the value x is x=1.b.sub.0 b.sub.1 b.sub.2 b.sub.3 . . . b.sub.m .times.2.sup.e, where b.sub.0 represents 2-.sup.1 or 0.5, b.sub.1 represents 2-.sup.2 or 0.25, etc. Thus, x can represent a number greater than or equal to 1 and less than 2 when the exponent e is ignored (or set to 0). As an example, an interval 1, 2) for x, where the exponential portion of x is ignored, can be subdivided into 2.sup.q intervals, namely x.sub.i, x.sub.i+1), for i=0, 1, . . . , 2.sup.q -1 (as used throughout, "" indicates an inclusive boundary and ")" indicates an exclusive boundary). Each subinterval x.sub.i, x.sub.i+1) has a length of 2.sup.-q. Referring to FIG. 1, a single ROM 1 is shown having q address lines (reference number 3) and p data lines (reference number 5). The first q bits of the mantissa of x (i.e., the most significant q bits of the n-bit mantissa, where n=m+1) are supplied to the address lines of the ROM causing the ROM to output the value f.sub.i as a p-bit value. The value q represents the index i for the interval x.sub.i, x.sub.i+1) The p bits output by the ROM are typically the mantissa for the value f.sub.i which can be a first approximation for the value f(x) ignoring the exponent. As an example, for the function f(x)=1/x, the exponent value (which can be stored in a register 8) is easily computed using external logic 6 rather than having it stored in memory. For instance, the exponent for f(x)=1/x is easily computed as the negative of the exponent for x minus 1 when x and f(x) are represented in normalized, floating point formats. The exponent for the operand x can be stored in a register 4.
In the above, example, the value f.sub.i is an approximation for the function f(x)=1/x over the entire range x.sub.i, x.sub.i+1) After obtaining the first approximation from ROM 1, in each iteration of the Newton-Raphson scheme, the number of bits of precision is doubled as compared to the previous approximation. In FIG. 1, the Newton-Raphson scheme is implemented in optimizing element 7 which can include a processing unit executing software instructions stored in a memory to produce a Mantissa result 9. To reduce the number of iterations performed by the Newton-Raphson scheme requires an increase in the precision of the approximation stored in the ROM 1. In implementing such a system, the entire x domain (or the desired portion thereof) for the function f(x) can be partitioned into n equally sized adjacent partitions x.sub.0, x.sub.1), x.sub.1, x.sub.2), . . . , x.sub.n-2, x.sub.n-1), where n=2.sup.q. For each interval, i, the value, f.sub.i, which is an approximation for the function f(x) over the entire interval x.sub.i, x.sub.i+1), is determined. The length of f.sub.1, represented as a number of bits, p, is predetermined based on a desired precision for that value. All values for f.sub.i are stored permanently in the i'th location of the ROM 1 having a width p. In calculating a value for f(x), the interval partition x.sub.i, x.sub.1+1) in which x lies is determined. Then, the value stored in the ROM 1 for the selected interval partition is retrieved as a first approximation to the value f(x).
Several options are available as to what value f.sub.i should be selected. An optimum choice would be the average of the maximum value for f(x) and the minimum value for f(x) over a given interval x.sub.i, x.sub.i+1). By choosing such a value for f.sub.i, no matter where x falls in the interval, the maximum error d.sub.i for the interval is less than or equal to the difference between the maximum of f(x) and the minimum of f(x) divided by 2. No other choice for f.sub.i will yield a lower value for d.sub.i.
Using a single ROM to calculate an approximation for a function such as f(x)=1/x will unavoidably include an error in the final result. Specifically, such an error has two components: 1) an approximation error representing the error between the actual value for f(x) and the approximation for a given interval f.sub.i ; and 2) a truncation error that results from the ROM having a finite width of p bits. As a worst case, the total error would be an addition of the approximation error and the truncation error but is typically a lesser value. The approximation error is controlled by the parameter q, which has an exponential effect on the size of the ROM (i.e., a ROM having q address lines can have as many as 2.sup.q addressable locations). The truncation error is controlled by the value p, which has a linear effect on the size of the ROM (i.e., each addressable location of the ROM must have p bits).
The function f(x)=1/x is a monotonically-decreasing function of x where the maximum error d.sub.i =(f(x.sub.i)-f(x.sub.i+1))/2 for all i=0, 1, 2, . . . , 2.sup.q -1. Furthermore the second derivative of f(x) (i.e., f"(x)) is greater than 0, thus d.sub.i decreases as x increases. The maximum value for d.sub.i would then be d.sub.0. By replacing the value 1/x with f.sub.i =(f(x.sub.i)+f(x.sub.i+1))/2, the value d.sub.0 is equal to (f(1)-f(1-2.sup.-q))/2&lt;2.sup.-(q+1), which is the bound of the approximation component of the error. As to the truncation error, carrying p bits of accuracy ensures an (absolute) precision of 2.sup.-p =2.sup.e. Because x is between 1 and 2 (ignoring the exponent), the value of 1/x falls between 0.5 and 1, hence the exponent "e," above, equals -1 and the bound on the truncation error is 2.sup.-(p+1) and the total error (which is at most the approximation error added to the truncation error) is 2.sup.-(q+1) +2.sup.-(p+1).
If accuracy is required to be less than a predetermined number E, then the relationship between parameters p and q can be determined. For the value q, 2.sup.-(q+1), which is the maximum approximation error, must be less than E. Thus q can have a value up to log.sub.2 (1/E)-1!. Once the value for q has been determined, the value for p can be calculated accordingly. Given the values for q and p, the size of the ROM is determined. The lower total error that is allowed for a given function, the higher the values for p and q, which leads to a larger ROM. As stated above, space in certain environments such as RISC architectures either may be too costly or not available, thus it may become necessary to decrease the size of the ROM, which results in a sacrifice of accuracy. Accordingly, there is a need for a method and apparatus for performing these types of computations that decreases the size of memory needed without sacrificing accuracy.