Digital devices perform a variety of arithmetic operations on binary numerical data. A processor within such digital devices includes a major subdivision called an arithmetic logic unit (ALU). The ALU performs a variety of data processing and arithmetic operations under the control of the processor. Although early processors had only one ALU, modern chips may have several ALUs, which may be classed into two types. One basic type of ALU is an integer unit which carries out simple integer mathematical operations including add, subtract, multiply, shift and logical instructions. More powerful processors also may include a second type of ALU, referred to as a floating-point unit, that handles advanced math operations on numbers with a wider range than simple integers (such as 1.03.times.10.sup.−19 for example.) Floating-point units use separate, dedicated instructions for their advanced functions.
The basic integer unit may also include a dedicated divider to perform arithmetic division. Because floating-point numbers and integers are represented differently in binary, and because the operations differ as a result, separate floating point and integer dividers are generally provided. Typically, floating point division is considered to be more important for high-demand applications, such as graphics and multimedia applications. Also, integer division is not performed with the same frequency as other mathematical operations. And so, many manufacturers save die real estate by providing only the most basic single bit per cycle (radix-2) integer divider, which reduces performance.
The integer divider commonly operates based on one of a variety of well-known subtractive algorithms. Subtractive algorithms each include a sequence of shift, subtract, and compare operations. Among subtractive algorithms, restoring, non-restoring, non-performing, and the Sweeney, Robertson, and Tocher (SRT) division algorithms are known. These division algorithms tend to be very slow in generating quotient values.
For example, one conventional subtractive division technique for binary numbers works similarly to standard long division in base-10 numbers. Each digit of the dividend, starting with the most significant digit, is compared to the divisor, and a digit of the quotient is computed. In computers, this is accomplished by the typical one bit per cycle (radix-2) integer divider by aligning the most significant bit of the dividend with the least significant bit of the divisor, subtracting the aligned digits, shifting the partial remainder to the left, subtracting, shifting again, and so on. For a 64-bit number, the minimum number of cycles is 64, plus several cycles for setting up the computation. Even in cases where the numbers have significantly fewer digits or the dividend is smaller than the divisor (a case which always results in zero for integer numbers) the entire process is performed. Thus, even radix-4 and radix-8 integer dividers, which process multiple bits per cycle, can be very inefficient.
An exemplary prior art non-restoring integer divider 20 is schematically represented in FIG. 1. The 64-bit dividend is right shifted by 63 bits using concatenation 22. The concatenated 128 bit word is stored in flip-flop 26. The left 65 bits, including one sign bit and the 64 bit partial remainder are read out and added in adder 34. The divisor is stored in flip-flop 28, converted into its two's compliment form using XOR 32, and added in carry-lookahead adder 34. The result from adder 34 is the partial remainder from which quotient digit 38 q(i) is calculated. Concatenation 36 combines the right 63 bits from split 29 with result from adder 34 and q(i). The cycle repeats 64 times and the final result is stored in the least significant 64 bits of flip-flop 26.
It would be desirable to improve the performance of integer division and to reduce real estate requirements on the integrated circuit die while at the same time improving division performance on integer numbers.