Contained herein is material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction of the patent disclosure by any person, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights to the copyright whatsoever.
1. Field of the Invention
The invention relates generally to implementations of floating-point calculations involving digit-by-digit methods in which the digits generated are in redundant form and rounding can be performed by addition or subtraction of the last digit. More particularly, the invention relates to an improved on-the-fly rounding technique useful in connection with hardware implementations of floating-point digit-recurrence algorithms, such as division and square root calculations.
2. Description of the Related Art
FIG. 1 illustrates examples of floating-point number representations according to American National Standards Institute (ANSI)/Institute of Electrical and Electronics Engineers (IEEE) std 754-1985 (IEEE Standard for Binary Floating-Point Arithmetic, 1985). In general, the floating-point representation 110 defined by IEEE standard 754 divides the binary digits into three fields: a sign field 111, an exponent field 112, and a fraction field 113. In both single-precision and double-precision, the sign field 111 is a single bit. The exponent field 112 is treated as an unsigned number, while the exponent is a signed number represented using the bias method. The exponent bias for a single-precision floating-point number 115 is 127 and for a double-precision floating-point number 120 is 1023. The fraction field 113 represents a number less than one, but the significand of the floating-point number is one plus the fraction part. When performing the division of two floating-point numbers W and X, such as:
W=(xe2x88x921)Swwxc3x972Ewxe2x88x92117 and X=(xe2x88x921)Sxxxc3x972Exxe2x88x92127
three different operations have to be performed on the sign, exponent, and significand to produce the quotient. Described herein are only the operations to be performed on the fractional part of the significand.
Digit-recurrence algorithms are a well-known class of algorithms for hardware implementation of floating-point digit-recurrence algorithms, such as division and square root calculations. Typically, the algorithm has a radix, r, that is usually a power of 2, r=2a. At each iteration, j=1, 2, . . . , a new digit dj of a fixed digit set D is generated such that QJ=d1rxe2x88x921+d2rxe2x88x922+ . . . +dJrxe2x88x92J converges to the infinitely precise result as J tends to infinity. Notably, in most practical algorithms, the set D is not the set {0, 1, 2, . . . , rxe2x88x921}. For example, a typical radix-4, r=4, algorithm generates digits dj such that dj∈{xe2x88x922, xe2x88x921, 0, 1, 2}. In a specific precision, for example, IEEE single-precision floating-point arithmetic that requires 24 correctly rounded significant bits, an iteration number J=L is taken to be just large enough to satisfy the requirement, and a rounding procedure is then performed. In a straightforward implementation, the rounding procedure requires a carry-propagation addition to be performed which is time consuming as this carry-propagation lies in the critical path of the whole algorithm.
A method called xe2x80x9con-the-fly roundingxe2x80x9d (described further in M.D. Ercegovac and T. Lang, On-the-fly Rounding, IEEE Transactions on Computers, volume 41, number 12, pp. 1497-1503, December 1992) has been devised to eliminate the need for a carry-propagation addition during the rounding procedure. However, as illustrated in FIG. 2, this existing on-the-fly rounding procedure requires three variables, Q(J), QM(J), and QP(J) to be stored. Q(J) represents the sum of intermediate digits for the Jth iteration. QM(J) represents the intermediate sum less the unit in the last position (Q(J)xe2x88x92rxe2x88x92j) and QP(J) represents the intermediate sum plus the unit in the last position (Q(J)+rxe2x88x92j). Each of the three variables are represented by canonical digits djxe2x88x92, dj, dj+∈0,1, . . . rxe2x88x921) and must be updated during each iteration J. Numerically,
QM(J)=d1xe2x88x92rxe2x88x921+d2xe2x88x92rxe2x88x922 + . . . +dJxe2x88x92rxe2x88x92J
Q(J)=d1rxe2x88x921+d2rxe2x88x922 + . . . +dJrxe2x88x92J
QP(J)=d1+rxe2x88x921+d2+rxe2x88x922 + . . . +dJ+rxe2x88x92J
When the iteration reaches J=Lxe2x88x921, one iteration before the last necessary for the desired precision, the on-the-fly rounding methods generate the correctly rounded result based on rounding just the last digit, dL, and appending it to one of the three variables QM(Lxe2x88x921), Q(Lxe2x88x921), and QP(Lxe2x88x921). No carry-propagating addition is required.
The main cost of such an on-the-fly rounding method is two-fold. First, three variables must be stored whose sizes are as large as the number of bits for the precision in question. Second, each of the three variables must be updated during each iteration of the digit recurrence algorithm. This translates directly into power consumption, which is becoming increasingly important as a design consideration.