So called digit-by-digit division (or on-line division) is an iterative process that follows the formula: EQU P.sub.j+1 =rP.sub.j -q.sub.j+1 D EQU Q.sub.j+1 =Q.sub.j +r.sup.-(j+1) q.sub.j+1,
where,
P.sub.j represents the remainder at the j-th iteration, PA1 r represents the radix (e.g., r=10 for decimal digits, 2 for binary digits, and 4 for radix-4 digits), PA1 q.sub.j+1 represents the next quotient digit, PA1 D represents the divisor, and PA1 Q.sub.j represents the quotient at the j-th iteration.
For example, the following manual computations obey the above formula for division: ##STR1##
In the above example, a person would immediately know what quotient digits to select in incrementally forming the quotient. That is, a person would select "3" as the first digit in the quotient, and then compute the second remainder rP.sub.1 as 34.57-3.times.9=7.57; and then select "8" as the next quotient digit, and so on, to form the final quotient (3.84) to a desired degree of precision. However, a computer would determine the respective quotient digits by looking them up in a table of some sort or would employ "hard-wired" logic circuitry for determining the respective digits. In addition, computers represent numbers in a prescribed format, e.g., binary digits with one digit to the left of the binary point. A computer requires approximately n/log.sub.2 (n) iterations to compute an n-bit quotient using a digit-by-digit technique.
The two main classes of digit-by-digit division are restoring division, in which the remainders are always positive, and non-restoring division, in which the remainders may be positive or negative. Hand division is almost always performed with restoring division because the concept of negative remainders is foreign to most humans. However, computers typically perform non-restoring digit-by-digit division. The Sweeney, Robertson, Tocher (SRT) algorithm is one example of a known method for performing non-restoring digit-by-digit division in a computer.
For example, in the case of radix-4 arithmetic, the digit set {-3, -2, -1, 0, 1, 2, 3} could be used to represent all possible quotients. However, it can be shown that the digit set {-2, -1, 0, 1, 2} is sufficient to represent all possible quotients if negative remainders are employed. For example, the radix-2 number "10011100" is equivalent to the radix-4 number "2130" but is also equivalent to the signed-digit radix-4 number "22-10". That is, four borrowed from the second digit position in "22-10" (i.e., borrowed from the second 2) could be added to the third digit position (added to -1) to obtain "2130". Accordingly, the set {-2, -1, 0, 1, 2} is referred to as the "minimally redundant" digit set for radix-4 arithmetic. Generally, the minimally redundant digit set for radix-r arithmetic is {-r/2, . . . 0, . . . r/2}.
FIG. 1 is a block diagram of conventional circuitry for performing radix-4 SRT division. As shown, a divider loop iteratively generates the rP.sub.j and q.sub.j+1 terms while a quotient builder operates in parallel with the divider loop to assemble a quotient. The divider loop generates a series of radix-4 quotient digits (2, -1, 0, 1, 1, -1, 2), which are assembled by the quotient builder into the signed-digit radix-4 quotient "2.-101-12". The quotient builder (or another circuit element) then converts the radix-4 quotient into a conventional binary (positive digit) number.
It is well known that digital computers perform division and square root computations in accordance with similar formulas. For example, the above formula for digit-by-digit division is very similar to the following digit-by-digit square root formula: EQU P.sub.j+1 =rP.sub.j -q.sub.j+1 (2Q.sub.j)-q.sub.j+1.sup.2 (r.sup.-(j+1))
Therefore, although it is described herein in the context of methods and apparatus for performing binary floating point division, the present invention is also applicable to methods and apparatus for performing square root computations. (As described below, the present invention provides apparatus that computes both quotients and square roots with the same circuitry.)
In carrying out a division or square root operation in a digital computer, it is necessary to perform a rounding operation such that the final result can be represented in the computer. The IEEE Standard 754-1985 specifies a standard for performing binary floating point arithmetic. According to the IEEE standard, binary floating point numbers are formatted as: EQU 1.x.sub.2 x.sub.3 x.sub.4 - - - x.sub.n y.sub.1 y.sub.2 y.sub.3 y.sub.4. . .
The leading "1" to the left of the binary point plus the binary digits (bits) "x.sub.2 " to "x.sub.n " constitute the final n-digit number represented in the computer; "x.sub.n " is the least significant bit (LSB); "y.sub.1 " is the "guard" bit; and the remaining bits to the right of the guard bit are ORed together to form a "sticky" bit.
The IEEE standard includes four alternative rounding modes, including: (1) round to zero (truncate to n bits); (2) round to positive infinity; (3) round to negative infinity; and (4) round to nearest even. The round to nearest mode, the most commonly employed, is described by the following code: ##STR2##
FIG. 2 schematically depicts the conventional IEEE rounding process for non-restoring division. The process includes the following steps: Forming an n-bit quotient plus two additional bits, a guard bit (G) and an extra bit (R); decrementing the quotient if the remainder is less than zero; normalizing the quotient by shifting all bits to the left one position if the most significant bit (MSB) is zero, resulting in a normalized quotient with a guard bit; inputting the extra bit R and the guard bit G to a rounding logic block that conditionally increments the quotient in accordance with the selected rounding mode; renormalizing the quotient by shifting all bits to the right if the increment causes an overflow; and storing the final mantissa of the quotient. It should be noted that the bit R is used in cases where the quotient is to be shifted left. In these cases, G becomes R. Otherwise, R is ORed into the sticky bit.
There are a number of problems with the conventional rounding process summarized above. For example, since the remainder is in the carry save form (see M. Ercegovak and T. Lang, "On-the-fly Rounding for Division and Square Root," fully cited below, for a description of carry save addition), determining whether the remainder is zero or less than zero requires log.sub.2 n gate delays. In addition, the decrement and the increment each require log.sub.2 n delays. Further, the round logic is in the critical path, causing further delay.
M. Ercegovak and T. Lang, in "On-the-fly Rounding for Division and Square Root," IEEE Transactions on Computers, Vol. C-36, No. 7, July 1987, pp. 895-897, hereby incorporated by reference into the present specification, disclose a method for converting signed digits of a quotient to conventional (unsigned) digits such that the result will be rounded. The conversion/rounding process is said to be performed "on-the-fly" and is purportedly faster than conventional operations because it does not employ carry-propagate addition.
Referring to FIG. 3, the Ercegovak-Lang method (hereinafter referred to as the "Ercegovak method") essentially involves maintaining three quotient registers Q+1, Q, Q-1, respectively containing the quotients Q+1, Q, and Q-1, and selecting the correct quotient via a multiplexor. A selection control signal is generated by a rounding block that employs a "remainder sign" bit (which is "1" if the partial remainder is negative and "0" otherwise), a "remainder=0" bit (which is "1" if the partial remainder is zero and "0" otherwise), and the last quotient digit (q.sub.n+1) as inputs. The content of each quotient register is set in accordance with the following algorithm: ##STR3##
The above rounding algorithm is summarized in the rounding table below, which shows the selected quotient for the minimally redundant radix-4 digit set and for positive and negative remainders. This table is for the round to nearest mode, which is the only rounding mode discussed in the above-cited paper.
______________________________________ Ercegovak Rounding Table last remainder round to nearest digit sign Q select ______________________________________ +2 + Q+1 +1 + Q 0 + Q -1 + Q -2 + Q +2 - Q +1 - Q 0 - Q -1 - Q -2 - Q-1 ______________________________________
In sum, the rounding process for IEEE division and square root computations is typically very costly since, to properly generate the sticky and guard bits, it requires detecting the sign of the partial remainder and comparing the partial remainder to zero. This operation is then followed by a post-increment or post-decrement of the quotient to properly generate the final result. Ercegovak discloses a method involving pre-computing the possible quotient values and replacing the increment/decrement step with a selection step. However, there are disadvantages in using the Ercegovak rounding method. For example, the cited paper lacks disclosure of how the algorithm can be modified to cover rounding modes other than the round to nearest mode. Moreover, the disclosed method appears not to handle formats with an unnormalized mantissa (i.e., where Q is outside the range 1/2.ltoreq.Q&lt;2), nor does it deal with complications caused by employing different number formats, such as IEEE single precision (23 bits) versus IEEE double precision (52 bits). Further, the method assumes the sticky bit is set, which may not be true in all cases.