1. Field of the Invention
The present invention relates to the field of floating point dividers in microprocessors. Specifically, the present invention relates to quotient digit selection rules in SRT division/square root implementations which prevent negative final partial remainders from occurring when results are exact, and which provide support for correct rounding in all rounding modes.
2. The Background Art
The SRT algorithm provides one way of performing non-restoring division. See, J. E. Robertson, xe2x80x9cA new class of digital division methods,xe2x80x9d IEEE Trans. Comput., vol. C-7, pp. 218-222, Sep. 1958, and K. D. Tocher, xe2x80x9cTechniques of multiplication and division for automatic binary computers,xe2x80x9d Quart. J. Mech. Appl. Math., vol. 11, pt. 3, pp. 364-384, 1958. Digital division takes a divisor and a dividend as operands and generates a quotient as output. The quotient digits are calculated iteratively, producing the most significant quotient digits first. In SRT division, unlike other division algorithms, each successive quotient digit is formulated based only on a few of the most significant partial remainder digits, rather than by looking at the entire partial remainder, which may have a very large number of digits. Since it is not possible to insure correct quotient digit selection without considering the entire partial remainder in any given iteration, the SRT algorithm occasionally produces incorrect quotient digit results. However, the SRT algorithm provides positive, zero, and negative quotient digit possibilities. If the quotient digit in one iteration is overestimated, then that error is corrected the next iteration by selecting a negative quotient digit. In SRT division, quotient digits must never be underestimated; quotient digits must always be overestimated or correctly estimated. By never underestimating any quotient digits, the partial remainder is kept within prescribed bounds so as to allow the correct final quotient to be computed. Because the SRT algorithm allows negative quotient digits, the computation of the final quotient output usually involves weighted adding and subtracting of the quotient digits, rather than merely concatenating all the quotient digits as in normal division.
The higher the radix, the more digits of quotient developed per iteration but at a cost of greater complexity. A radix-2 implementation produces one digit per iteration; whereas a radix-4 implementation produces two digits per iteration. FIG. 1 illustrates a simple SRT radix-2 floating point implementation. The simple SRT radix-2 floating point implementation shown in FIG. 1 requires that the divisor and dividend both be positive and normalized; therefore, xc2xdxe2x89xa6D, Dividendxe2x89xa61. The initial shifted partial remainder, 2PR[0], is the dividend. Before beginning the first quotient digit calculation iteration, the dividend is loaded into the partial remainder register 100; thus, the initial partial remainder is the dividend. Subsequently, the partial remainders produced by iteration are developed according to the relationship
PRi+1=2PRixe2x88x92qi+1Dxe2x80x83xe2x80x83(R.1)
In relationship (R.1), qi+1 is the quotient digit, and has possible values of xe2x88x921, 0, or +1. This quotient digit qi+1 is solely determined by the value of the previous partial remainder and is independent of the divisor. The quotient selection logic 102 takes only the most significant four bits of the partial remainder as input, and produces the quotient digit. In division calculations, the divisor remains constant throughout all iterations. However, square root calculations typically involve adjustments to the divisor stored in the divisor register 101 after each iteration. Therefore, the independence of the quotient digit selection on the divisor is an attractive feature for square root calculations.
The partial remainder is typically kept in redundant carry save form so that calculations of the next partial remainder can be performed by a carry-save adder instead of slower and larger carry-propagate adders. The partial remainder is converted into non-redundant form after all iterations have been performed and the desired precision has been reached. Because the SRT algorithm allows overestimation of quotient digits resulting in a negative subsequent partial remainder, it is possible that the last quotient digit is overestimated, so that the final partial remainder is negative. In that case, since it is impossible to correct for the overestimation, it is necessary to maintain Q and Qxe2x88x921, so that if the final partial remainder is negative, Qxe2x88x921 is selected instead of Q. The quotient digits are normally also kept in redundant form and converted to non-redundant form at the end of all iterations. Alternatively, the quotient and quotient minus one (Q and Qxe2x88x921) can be generated on the fly according to rules developed in M. D. Ercegovac and T. Lang, xe2x80x9cOn-the-fly rounding,xe2x80x9d IEEE Trans. Comput., vol. 41, no. 12, pp. 1497-1503, December 1992.
The SRT algorithm has been extended to square root calculations allowing the utilization of existing division hardware. The simplified square root equation looks surprisingly similar to that of division. See, M. D. Ercegovac and T. Lang, xe2x80x9cRadix-4 square root without initial PLAxe2x80x9d, IEEE Trans. Comput., vol. 39, no. 8, pp. 1016-1024, August 1990. The iteration equation for square root calculations is as-follows.
PRi+1=2PRixe2x88x92qixe2x88x921(2Qi+qI+12xe2x88x92(I+1)xe2x80x83xe2x80x83(R.2)
In relationship (R.2), the terms in parentheses are the effective divisor. For square root calculations, the so-called divisor is a function of Qi, which is a function of all the previous root digits q1 through qi. The root digits hereinafter will be referred to as xe2x80x9cquotient digitsxe2x80x9d to maintain consistency in terminology. Therefore, in order to support square root calculation using the same hardware as used for division, on-the-fly quotient generation is required in order to update the divisor after each iteration.
Binary division algorithms are analogous to standard base 10 long division. In R/D=Q, each quotient digit for Q is guessed. In order to determine the first quotient digit, a guess for the proper quotient digit is multiplied by the divisor, and that product is subtracted from the dividend to produce a remainder. If the remainder is greater than the divisor, the guess for the quotient digit was too small; if the remainder is negative, the guess for the quotient digit was too large. In either case, when the guess for the quotient digit is incorrect, the guess must be changed so that the correct quotient digit is derived before proceeding to the next digit. The quotient digit is correct when the following relation is true: 0xe2x89xa6PRxe2x89xa6D, in which PR stands for the partial remainder after subtraction of the quotient digit multiplied by the divisor.
The key to the SRT division algorithm is that negative quotient digits are permitted. For example, in base 10, in addition to the standard digits 0 through 9, quotient digits may take on values of xe2x88x921 through xe2x88x929. Consider the division operation 600/40. If the correct quotient digits are selected for each iteration, the correct result is 15. However, assume for the moment that during the first iteration, a quotient digit of 2 was incorrectly guessed instead of the correct digit of 1. The partial remainder after 2 has been selected as the first quotient digit is 600xe2x88x92(2*40*101)=xe2x88x92200. According to SRT division, this error can be corrected in subsequent iterations, rather than having to back up and perform the first iteration again. According to SRT division, assume that the second quotient digit is correctly guessed to be xe2x88x925. The partial remainder after that iteration will be xe2x88x92200xe2x88x92(xe2x88x925*40*100)=0. When the partial remainder after an iteration is zero, the correct values for all the remaining digits are zeros. Thus, the computed result is 2*10xe2x80x2xe2x88x92+xe2x88x925*100=15, which is the correct result. The SRT algorithm thus allows an overestimation of any given quotient digit to be corrected by the subsequent selection of one or more negative quotient digits. It is worth noting that the estimated quotient digit must not be more than one greater than the correct quotient digit in order to subsequently reduce the partial remainder to zero, thus computing the correct result. If errors greater than positive one were allowed in estimating quotient digits, then quotient digits less than xe2x88x929 (for example xe2x88x9210, xe2x88x9211, etc.) would be required in base 10. Similarly, since the range of quotient digits is not expanded in the positive direction at all according to the SRT algorithm, underestimation of the correct quotient digit is fatal, because the resulting partial remainder will be greater than the divisor multiplied by the base, and a subsequent quotient digit higher than 9 (for example 10, 11, etc.) in base 10 would be required. Therefore, in order to keep the partial remainder within prescribed bounds, the quotient digit selection must never underestimate the correct quotient digit, and if it overestimates the quotient digit, it must do so by no more than one.
It is possible to guarantee that the above criteria for keeping the partial remainder within prescribed bounds will be satisfied without considering all the partial remainder digits. Only a few of the most significant digits of the partial remainder must be considered in order to choose a quotient digit which will allow the correct result to be computed.
SRT division requires a final addition after all quotient digits have been selected to reduce the redundant quotient representation into standard nonredundant form having only non-negative digits.
In binary (base 2) which is utilized in modern electrical computation circuits, SRT division provides quotient digits of +1, 0, or xe2x88x921. The logic 102 which generates quotient selection digits is the central element of an SRT division implementation. The selection rules according to the prior art can be expressed as in the following equations in which PR represents the most significant four bits of the actual partial remainder, and in which the decimal point appears between the third and fourth most significant digits. The partial remainder is in two""s complement form, so that the first bit is the sign bit.
qi+1=1, if 0xe2x89xa62PR[i]xe2x89xa6{fraction (3/2)},
xe2x80x83qi+1=0, if 2PR[i]=xe2x88x92xc2xd,
qi+1=1, if xe2x88x92{fraction (5/2)}xe2x89xa62PR[i]xe2x89xa6xe2x88x921.
Because the partial remainder is stored in register 100 in carry-save form, the actual most significant four bits are not available without performing a full carry propagate addition of the carry and sum portions of the partial remainder. Because it is desirable to avoid having to perform a full carry propagate addition during each iteration in order to compute the most significant four bits of the partial remainder, quotient digit selection rules can be developed using an estimated partial remainder.
The estimated partial remainder (PRest) is computed using only a four-bit carry propagate adder that adds the most significant four bits of the carry and sum portions of the actual partial remainder. This simplification represents a significant savings of latency because the equivalent of a full 59 bit carry propagate addition would otherwise be required to compute the actual most significant four bits of the partial remainder. The estimated partial remainder PRest does not reflect the possibility that a carry might propagate into the bit position corresponding to the least significant bit position of the estimated partial remainder if a full 59-bit carry propagate addition had been performed. The truth table below describes the quotient selection rules according to the prior art where the most significant four bits of the estimated partial remainder are used to select the correct quotient digit. Thus, the truth table below takes into consideration the fact that the most significant four bits of the true partial remainder may differ from the most significant four bits of the estimated partial remainder.
In the above truth table, the four bits representing 2PRest are a non-redundant representation of the most significant four carry and sum bits of the partial remainder. The fourth bit is the fraction part, so that the resolution of the most significant four bits of the partial remainder is xc2xd.
The quotient selection logic is designed to guess correctly or overestimate the true quotient result, e.g. predicting 1 instead of 0, or 0 instead of xe2x88x921. The SRT algorithm corrects itself later if the wrong quotient digit has been chosen.
The prior art truth table for SRT radix-2 quotient selection logic has several don""t care inputs because the partial remainder is constrained to xe2x88x92{fraction (5/2)}xe2x89xa62PR[i]xe2x89xa6{fraction (3/2)}. The estimated partial remainder is always less than or equal to the true most significant bits of the partial remainder because the less significant bits are ignored. Therefore, there is a single case (marked with an asterisk in the above table) where the estimated partial remainder appears to be out of bounds. By construction, the real partial remainder is within the negative bound because the SRT algorithm as implemented will never produce an out of bounds partial remainder, so xe2x88x921 is the appropriate quotient digit to select. There are two other cases (those corresponding to the entries for 111.0 and 111.1 in Table I in which the quotient digit selected based on the estimated partial remainder differs from what would be chosen based on the real partial remainder. However, in both of these instances of xe2x80x9cincorrectxe2x80x9d quotient digit selection, the quotient digit is not underestimated and the partial remainder is kept within prescribed bounds, so that the final result will still be generated correctly.
The following Table II illustrates the quotient selection logic described in Table I in a simplified form. In the table below, an xe2x80x9cxxe2x80x9d represents a xe2x80x9cdon""t carexe2x80x9d logic variable obviously, the third case, in which 1xx.x produces a xe2x88x921 quotient digit does not apply when the estimated partial remainder is 111.1, such that the second entry applies, and the correct quotient digit is 0.
Floating point operations generate a sticky bit along with the result in order to indicate whether the result is inexact or exact. When the result is inexact, the sticky bit is asserted; conversely, when the result is exact, the sticky bit is deasserted. Essentially, the sticky bit indicates whether or not any of the bits of less significance are non-zero. The sticky bit is also used with the guard and round bits for rounding according to IEEE Standard 754. See, xe2x80x9cIEEE standard for binary floating-point arithmetic,xe2x80x9d ANSI/IEEE Standard 754-1985, New York, The Institute of Electrical and Electronic Engineers, Inc., 1985.
For divide and square root operations, the sticky bit is determined by checking if the final partial remainder is non-zero. The final partial remainder is defined as the partial remainder after the desired number of quotients bits have been calculated. Since the partial remainder is in redundant form, a carry-propagate addition is performed prior to zero-detection. A circuit for computing the sticky bit is shown in FIG. 2. In FIG. 2, the carry 201 and sum 202 portions of the final partial remainder are added together by the carry propagate adder 200. The most significant bit output by the adder 200 is the sign bit 203 of the final partial remainder. As illustrated in FIG. 1, the division hardware accumulates the quotient Q and the quotient minus one Qxe2x88x921. When the final partial remainder is negative, Qxe2x88x921 is the proper quotient; when the final partial remainder is zero or positive, Q is the correct quotient. Thus, the sign bit 203 is used to select the correct quotient. Referring again to FIG. 2, the zero detector 204 determines if all bits of the non-redundant final partial remainder 205 are zeros and outputs a the sticky bit 206. The zero detector 204 is logically equivalent to a large 59 input OR gate.
At first glance, the above solution seems perfectly reasonable for all final partial remainder possibilities, positive or negative. However, in the rare case in which the result is exact, the final partial remainder will be equal to the negative divisor. For example, consider a number divided by itself, as illustrated in the table below, in which PR[i] represents the partial remainder after the with quotient digit has been selected.
Since the dividend is always positive and normalized, the quotient digit from the first iteration is one. This is a consequence of the fact that a positive normalized number has a sign bit of zero and a most significant digit of one. When a positive normalized number is divided by two, presumably by right shifting by one bit position, the most significant bit necessarily becomes a zero. (if a negative number is divided by two, the most significant bit is one, because the most significant bits are sign extended so as to match the sign bit giving the correct two""s complement representation.) When the most significant bit is a zero, Table II above dictates that a quotient digit of one should be selected.
For the second iteration shown in Table III, the partial remainder PR[1] is zero which causes the second quotient digit to be one. For all subsequent iterations, the partial remainder will equal the negative divisor and quotient digits of minus one will be selected. After the last iteration, performing a sign detect on the final partial remainder PR[n] determines that the final partial remainder is negative and indicates that Qxe2x88x921 should be chosen. This is in fact the correct result. However, this same final partial remainder is non-zero which erroneously suggests an inexact result and erroneously suggests that the sticky bit should be asserted.
This problem extends to any division operation for which the result should be exact. Fundamentally, the problem is a consequence of the fact that the quotient selection logic is defined to guess positive for a zero partial remainder and correct for it later, as illustrated by Table II. The prior art dividers require one processor cycle for restoration of negative final partial remainders prior to sticky bit calculation. To insure correct rounding, it is necessary to correctly compute the sticky bit. It would be advantageous to develop a divider which did not restore negative final partial remainders, but that could nevertheless guarantee correct computation of the sticky bit.
The present invention provides the ability to correctly and efficiently compute the sticky bit during floating point division and square root computations when the final partial remainder is negative. The present invention further provides an optimized quotient selection circuit implementing quotient selection rules according to the present invention with minimized latency and size.
The present invention provides an enhanced quotient digit selection function which prevents the working partial remainder from becoming negative if the result is exact. This creates a one cycle savings since negative partial remainders no longer need to be restored before calculating the sticky bit.
According to the present invention, the quotient digit selection logic is modified so as to prevent a partial remainder equal to the negative divisor from occurring. To correctly and efficiently compute the sticky bit for exact division results, enhancing the quotient digit selection function so as to prevent the formation of a negative partial remainder equal to the negative divisor is an ideal solution because it saves hardware and improves latency. Extra hardware is eliminated because it is no longer necessary to provide any extra mechanism for restoring the preliminary final partial remainder. Latency is improved because additional cycle time is not required to restore negative preliminary partial remainders.
According to the present invention, the quotient digit selection logic is altered so as to choose a quotient digit of zero instead of a quotient digit of one when the actual partial remainder is zero. Using a four-bit estimated partial remainder where the upper three bits are zero, a possible carry propagation into third most significant bit is detected. This can be accomplished by looking at the fourth most significant sum and carry bits of the redundant partial remainder. If they are both zero, then a carry propagation out of that bit position into the least significant position of the estimated partial remainder is not possible, and a quotient digit of zero is chosen. In the alternative case in which one or both of the fourth most significant carry or sum bits of the redundant partial remainder are ones, a quotient digit of one is chosen.
According to an alternative embodiment of the present invention, the quotient digit selection logic is additionally altered so as to choose a quotient digit of negative one when the actual partial remainder is negative one half or less. Using the four-bit estimated partial remainder, a quotient digit of negative one is selected when the upper three bits are ones while the fourth most significant bit is zero. This can be accomplished by looking at the fourth most significant sum and carry bits of the redundant partial remainder. If they are both zero, then a carry propagation out of that bit position into the least significant position of the estimated partial remainder is not possible, and a quotient digit of negative one is chosen. In the alternative case in which one or both of the fourth most significant carry or sum bits of the redundant partial remainder are ones, a quotient digit of zero is chosen.