1. Field of the Invention
The present invention relates to circuits that perform arithmetic operations. More specifically, the present invention relates to a method and an apparatus that efficiently performs an accuracy-check computation for Newton-Raphson divide and square-root operations.
2. Related Art
Several techniques can be used to perform divide, r/s, and square-root, √{square root over (s)}, operations. One popular technique is to use the Newton (sometimes called the Newton-Raphson) method. As typically implemented for the division operation, the Newton-Raphson method first finds an approximation to a zero of the function
      f    ⁡          (      x      )        =      1    -          1      sx      for the reciprocal of the denominator, 1/s, in the division operation. Similarly, for the square-root operation, the Newton-Raphson method first finds an approximation to a zero of the function
      f    ⁡          (      x      )        =      1    -          1              sx        2            for the reciprocal of the square root, 1/√{square root over (s)}.
The Newton-Raphson method starts with an initial estimate, a, of the zero to the function. Better estimates are obtained by iterating using the formula
      a    next    =      a    -                  f        ⁡                  (          a          )                                      f          ′                ⁡                  (          a          )                    This computation can be accomplished with only the following operations: add, subtract, multiply, and, divide-by-two (for square root, which can be accomplished with a shift). To simplify the method, the denominator, s, (for divide) is normalized to be between one half and one, and the value, s, (to find the square root of) is normalized to be between one fourth and one. Thus, the zero of the function (for both cases) is between one and two. A property of the Newton-Raphson method that may be maintained when used with these functions is that, independent of the initial estimate a, all succeeding estimates are less than the zero of the function.
Sufficient Newton-Raphson iterations are carried out to obtain the desired internal accuracy. Then a multiplication is performed. For divide, the result of the Newton-Raphson iterations is multiplied by the numerator, r, to obtain an internal result, m. For square root, the result of the Newton-Raphson iterations is multiplied by s (because s*1/√{square root over (s)}=√{square root over (s)}) to obtain an internal result m. In both cases, m is an approximation to the exact result, e. Also, the internal result, m, needs to have more accuracy than the accuracy of the final result for the method described below. Furthermore, we ensure that m is less than the exact result. The internal result, m, is also called the “result of the Newton-Raphson method”.
The result of the Newton-Raphson method contains only a finite amount of accuracy. Thus, it is not exact. The desired result is the exact result rounded according to one of the following three rounding modes: (1) round towards zero (truncate), (2) round towards infinity (round up), or (2) round to nearest (if exactly half way between representable results, round to “even” to make the least-significant bit (LSB) zero).
The internal estimate, m, is rounded to produce a rounded result, t, which functions as a proposed answer. However, no matter how much extra accuracy has been achieved, the extra accuracy is finite, so the value of t may not be the same as the rounded exact result. For example, consider FIG. 1 which illustrates a segment of the real number line, with vertical lines representing values that can be represented with the finite external accuracy. Note that in the expression t+1, the +1 represents 1 added to the LSB of t (with external accuracy).
The internal value, m, is the current (best obtained) estimate. The example in FIG. 1 is for rounding towards zero (down), so that m is truncated to t. However, the exact result is the value e. This means that no matter how much accuracy is obtained in computing, m, it is always possible that a representable value (in FIG. 1, the value is t+1) is between m and e. In order to produce the correct result, this situation must be evaluated and t must be replaced with t+1 if appropriate.
The system performs an “accuracy-check” computation to determine whether t must be replaced with t+1. Each combination of a rounding mode and an operation requires a different formula to be evaluated to make this determination. (The derivations of these formulae are discussed in following sections of this application.) There are three rounding modes, round down, round to nearest, and round up) and two operations (divide (r/s) and square root √{square root over (s)}). Thus, to make determinations for all combinations of the three possible rounding modes and the two possible operations, the following six formulae need to be computed.
In the following, a pair of equations is given when a modification is needed. The left side shows the computation as perceived without any modification to the Booth encoding. The right side shows the computation as perceived with the modification to the Booth encoding, the modification being to add ½, 1, or 2 to t. (Booth encoding is explained in a later section.)
1. divide round downz = (t * s) − r + s = ((t + 1) * s) − r2. divide round nearestz = (t * s) − r + s/2 = ((t + (1/2)) * s) − r3. divide round upz = (t * s) − r4. square root round downz = (t * t) − s + 2t + 1 = ((t + 2) * t) − s + 15. square-root round nearestz = (t * t) − s + t + ¼ = ((t + 1) * t) − s +¼6. square-root round upz = (t * t) − sIn all cases, if z<0, then t+1 is the correct answer. If z>0, then t is the correct answer. The case where z=0 is discussed below.
If a multiply-add pipeline is available, as is often the case, the system can easily compute (t*s)−r or (t*t)−s in one pass through the multiply-add pipeline. Thus, the third and sixth formulae may be computed in one pass through the multiply-add pipeline. For the remaining four cases, after the multiply-add computation, one can make an additional pass through the pipeline to add the additional term. (Note that 2t+1 and t+¼ are considered to be single terms).
It is also possible to take another approach. Since we are concerned with the sign of z instead of its actual value (except when it is zero), the system can instead compare the result of the multiply-add to the additional term to see if the result is greater-than, equal-to, or less-than the additional term. This approach also involves another pass through the multiply-add pipeline, and is hence similar to doing the addition.
Another approach is to add an additional “partial-product row” to the carry-save adder portion of the multiply-add pipeline to add the additional term to the result. This produces the value of z directly without an additional pass through the multiply-add pipeline. Although this approach is more efficient in time, it requires additional hardware for the extra partial-product row.
Hence, what is needed is a method and an apparatus for performing an accuracy-check computation for Newton-Raphson division and square-root operations without requiring an additional pass through the multiply-add pipeline, and without requiring an additional partial-product row in the carry-save adder portion of the multiply-add pipeline.