This invention relates to an apparatus for performing division operations in computer processors. More particularly, the apparatus relates to a two-stage multiply pipeline apparatus for implementing a divide operation by utilizing a derivative of the Newton-Raphson formula. The apparatus produces a result which has reliable precision beyond to the least significant bit of the result term which is developed in each of the iterations associated with the multiply pipeline.
The fastest techniques for performing division in a computer processor involve the use of multiply hardware or multiply-add hardware which is designed to implement a Newton-Raphson formula or algebraic equivalent. The Newton-Raphson formula is defined as: EQU X.sub.i+1 =X.sub.i *(2-X.sub.i *D)
The foregoing formula describes how an approximation for the reciprocal of a divisor can be multiplied by the dividend to produce an approximation for the quotient. The formula is used repeatedly to obtain a desired precision for a reciprocal, and every application of the formula results in a doubling of the precision of the reciprocal. Each iteration involves a multiplication step and a subtraction step which must wait until the previous iteration is finished, followed by another multiplication step which must wait for the result of the previous subtraction. Therefore, implementation of the conventional Newton-Raphson formula is impractical for multiply pipeline operations.
A quotient convergence formula which is algebraically equivalent to the Newton-Raphson formula provides iterative calculations for the quotient rather than the reciprocal. One extra multiplication is required per iteration, but each calculation is pipelineable, which means that for a two-stage pipeline any one calculation does not depend on the previous calculation for a result as one of the input operands of the one calculation. Two-stage multiply-add pipelines require one fewer machine cycle per iteration using quotient convergence algorithm as opposed to the unaltered Newton-Raphson formula. The three terms which make up the quotient convergence formula are: EQU N.sub.i+1 =N.sub.i *R.sub.i EQU R.sub.i+1 =2-D.sub.i *R.sub.i EQU D.sub.i+1 =D.sub.i *R.sub.i
To provide the initial values, R.sub.0 is given by a look-up table, commonly implemented with combinatorial logic or a Read Only Memory (ROM), and this initial value provides the "seed" value for the reciprocal of the divisor out to a few bits of precision. This "seed" is identical to the "seed" used as X.sub.0 for the first Newton-Raphson iteration in the Newton-Raphson formula. D.sub.0 is the divisor or denominator, and N.sub.0 is the dividend or numerator.
The R term in the quotient convergence formula is sometimes called an error term because the magnitude of the deviation of the particular R.sub.i in question from unity is equal to the precision of the corresponding quotient for that iteration. The quotient convergence formula and the fact that it is pipelineable and has machine cycle-saving advantages is itself known in the art. The pipelining process for using the quotient convergence formula in computer processor hardware generally follows the following process steps:
1) Look up the seed value for the reciprocal R.sub.0, of the divisor D.sub.0. PA1 2) Solve for N.sub.1 =R.sub.0 *N.sub.0, where N.sub.0 is the dividend. PA1 3) Solve for D.sub.1 =R.sub.0 *D.sub.0. PA1 4) Solve for R.sub.1 =2-R.sub.0 *D.sub.0. PA1 5) Repeat steps 2, 3 and 4, producing a more precise N which approaches the quotient, and D and R terms which are of opposite sign and approach the constant value of 1.
The problem with the foregoing pipelining solution is that it does produce a loss of precision near the least significant bit of each iterative result term. This loss of precision is a consequence of having to round or truncate the full precision result to the precision of the source operands of each iterative expression. In order to overcome this problem, precision must be maintained beyond the least significant bit of the result, in order to provide a correct rounding for the final result. Under existing IEEE standards for calculations involving floating point numbers, a result must be rounded as if an infinite amount of precision bits were maintained; the only way of actually meeting this requirement is by having precision at least a few bits to the right of the least significant bit. Therefore, the hardware implementation requires extra-width multiply logic capability for carrying the extra bits, which are necessary to preserve precision or the result.