This invention relates to the field of high-speed division hardware for general purpose computer systems. In particular, it relates to the class of S.R.T. dividers capable of producing multiple bits of quotient per clock cycle through cascaded divider stages.
Classical binary (radix-2) restoring, nonperforming, and nonrestoring dividers typically require one iteration or cycle, or one full divider stage, per bit of quotient generated. With these dividers, 32 cycles are required for division of a 64-bit dividend by a 32-bit divisor to produce a 32-bit quotient.
Dividers that operate in a radix greater than two, such as in radix 4 or radix 8 offer the possibility of performing division in fewer cycles or stages than radix 2 dividers. Radix 4 dividers can divide a 64-bit dividend by a 32 bit divisor to produce a 32 bit quotient in 16 cycles or stages, plus overhead, by producing two bits of quotient in each cycle. A radix 8 divider can perform this division in 11 cycles or stages, plus overhead, by producing three bits of quotient per cycle or divider stage.
Dividers that implement two or more cascaded divider stages can produce more than one quotient bit per cycle. These dividers can be challenging to build because of the amount of logic required.
SRT division has been in the news because a look-up-table having an incorrect entry in early Pentium processors. This division method, named after D. Sweeney, J. Robertson, and K. Tocher, is a nonrestoring division algorithm using a signed quotient digit set.
Prabhu, et al., describe an effectively radix 8 SRT divider in U.S. Pat. No. 5,870,323. Radix 8 SRT dividers like that of Prabhu, et al., may be used in high speed processors to produce more than one quotient bit per clock cycle.
SRT division is performed by iterating a sequence of
a. estimating one or more digits of quotient, based on the most significant bits, including sign, of the dividend or partial remainder and the divisor. The quotient digit may represent one or more bit positions in the eventual quotient.
b. subtracting a product of the quotient digit times the divisor from the dividend or partial remainder to form a new partial remainder. This subtraction is often performed in carry-save form in the least significant bits, but carry must be propagated in the most significant bits during either the subtraction or during the estimation of the next one or more digits of quotient.
c. shifting the quotient digit into a quotient register.
d. shifting the new partial remainder by at least one bit position(s) and iterating steps a, b, and c until sufficient digits of quotient have been obtained.
The divider of Prabhu, et al., has several, preferably three, overlapped stages of radix-2 SRT division to provide the effect of a high radix, preferably radix-8, divider. Three bits of quotient are generated in each clock cycle, one bit from each of the overlapped stages.
In each stage, a quotient selection logic look-up table, which may be implemented as logic gates, ROM or PLA, generates each estimate of quotient bits. Multiple quotient bit estimation logic circuits operating in sequence are provided to produce several quotient digits in each clock cycle. In parallel with the estimation of a first, a second, and a third digit, the divisor is multiplied by all possible values of the digit estimates, and these values are subtracted from the dividend or partial remainder to form a set of differences in carry-save form. A multiplexor, controlled by the estimates, then selects a new partial remainder from the set of differences. This computation of several possible differences, followed by selection of the difference corresponding to the digit generated, is speculative execution. In Prabhu""s divider, the partial remainder is recycled in carry-save form, and speculative execution is used to achieve high-speed execution at the cost of many more carry-save adders than would be required without speculative execution.
It is known that SRT division can be performed with less speculative execution than in the divider of Prabhu, et. al. In this technique, quotient digit estimates are computed as described. The digit estimate is used to control a multiplexor that selects the divisor multiple corresponding to the digit, the selected divisor multiple is then subtracted from the dividend or partial remainder to form a new partial remainder.
One-hot encoding is known to be an alternative method of representing numbers or parts of numbers. One-hot encoding requires a number of lines equal to two raised to the power of the number of equivalent binary bits of the number or part of a number to be represented; hence one-hot encoding three binary bits requires eight lines, one-hot encoding four bits requires sixteen lines, etc. One-hot encoding is therefore rarely used to represent large numbers.
It is known that adding to one-hot encoded numbers is equivalent to shifting the one-hot encoded number by a number of bit positions equal to the number added to the one-hot encoded number. For example, two in eight-line one-hot encoded form is 0000 0100. Adding three to this is equivalent to left shifting by three places, to produce 0010 0000, or five in one-hot form.
It has been found that, if the most significant bits of partial remainder are generated initially in one-hot encoded form, it is possible to reduce the number of logic levels, and hence the time required for generation of each successive partial remainder. The one-hot encoded form of the most significant bits of the partial remainder is then recoded into a binary form when carry is propagated to produce a final remainder.
The reduction of logic levels occurs in part because one-hot encoded addition or subtraction is equivalent to a shift operation, with no need to separately propagate a carry signal, and in part because with a one-hot encoded partial remainder, few levels of logic are necessary to estimate each quotient digit.
It has also been found that with the most significant bits of the partial remainder in one-hot encoded form, the quotient digit estimate can be computed quickly enough that it is possible, in some dividers, to avoid using speculative execution logic during computation of the binary encoded less bits of each partial remainder.