1. Field of the Invention
The present invention relates generally to arithmetic logic units and in particular to dividers.
2. Description of the Related Art
Dividers are used in microprocessors and data-processing devices to perform arithmetic division. Because floating-point numbers and integers are represented differently in binary, and because the operations differ as a result, separate floating point and integer dividers are generally provided. Typically, floating point division is considered to be more important for high-demand applications, such as graphics and multimedia applications. Also, integer division is not performed with the same frequency as other mathematical operations. And so, many manufacturers save die real estate by providing only the most basic single bit per cycle (radix-2) integer divider, which reduces performance. While combined floating point and integer dividers are known, they generally have not provided significant performance or space-efficiency improvements over separate dividers.
Integer division and floating point division are commonly performed using one of a variety of well-known subtractive algorithms. Subtractive algorithms each include a sequence of shift, subtract, and compare operations. Among subtractive algorithms, restoring, non-restoring, non-performing, and the Sweeney, Robertson, and Tocher (SRT) division algorithms are known.
Subtractive division works similarly to standard long division. Each digit of the dividend, starting with the most significant digit, is compared to the divisor, and a digit of the quotient is computed. In computers, this is accomplished by the typical one bit per cycle (radix-2) integer divider by aligning the most significant bit of the dividend with the least significant bit of the divisor, subtracting the aligned digits, shifting the partial remainder to the left, subtracting, shifting again, and so on. For a 64-bit number, the minimum number of cycles is 64, plus several cycles for setting up the computation. Even in cases where the numbers have significantly fewer digits or the dividend is smaller than the divisor (a case which always results in zero for integer numbers) the entire process is performed. Thus, even radix-4 and radix-8 integer dividers, which process multiple bits per cycle, can be very inefficient.
An exemplary prior art non-restoring integer divider 20 is schematically represented in FIG. 1. The 64-bit dividend is right shifted by 63 bits using concatenation 22. The concatenated 128 bit word is stored in flip-flop 26. The left 65 bits, including one sign bit and the 64 bit partial remainder are read out and added in adder 34. The divisor is stored in flip-flop 28, converted into its two's compliment form using XOR 32, and added in carry-lookahead adder 34. The result from adder 34 is the partial remainder from which quotient digit 38 q(i) is calculated. Concatenation 36 combines the right 63 bits from split 29 with result from adder 34 and q(i). The cycle repeats 64 times and the final result is stored in the least significant 64 bits of flip-flop 26.
Floating point numbers are generally stored in binary as A=SarEa which includes a normalized significand Sa multiplied by the radix r raised to the Ea power. The significand, when normalized, has a 1 in the most significant position and a decimal point immediately after the most significant position. Floating-point algorithms operate on the significand portions of the operands in a manner similar to the integer division algorithm described above, with the exponents being subtracted. However, with floating point division, division stops when the remainder is zero.
The Prabhu/Zyner algorithm presented in “167 MHz Radix-8 Divide and Square Root Using Overlapped Radix-2 Stages,” 12th Symposium Computer Arithmetic, Bath, England, 1995, pages 155-162, J. Arjun Prabhu and Gregory B. Zyner, which is wholly incorporated herein by reference, shows an exemplary radix-8 floating point SRT algorithm. The Prabhu/Zyner algorithm uses carry-save adders (CSA) to perform multiple SRT division steps stacked in a single cycle, resulting in a low latency floating point divide.
It would be desirable to improve the performance of integer division as well as leverage existing hardware present in a floating-point divider such as the Prabhu/Zyner divider to reduce real estate requirements on the die while at the same time improving division performance on integer numbers.