The present invention relates generally to methods and apparatus for performing SRT division, and more particularly to a novel division algorithm and associated processing unit for performing SRT division.
Modern microprocessors typically include hardware configured to handle floating-point divide and square-root operations. However, given the complexity of the divide and square-root operations, the performance of these operations is considerably lower than other basic mathematical operations. Division and square-root operations are critical to real applications, so it is important their performance and hardware area requirements are balanced with those of other mathematical operations.
There are two major categories of divide and square-root algorithms, multiplicative and subtractive methods, and within each category a considerable number of design variables. Although once regarded as slow and excessively complicated to implement, advancements in technology have made the subtractive methods of division and square-root calculations the algorithms of choice.
The most common subtractive or digit recurrence division algorithm is the SRT algorithm. SRT stands for D. Sweeny, J. E. Robertson, and K. D. Tocher, who more or less developed division procedures using very similar techniques. With the SRT algorithm, as well as with other subtractive methods, quotients and square-roots are computed directly, one digit per iteration; for this reason, they are also known as digit recurrence algorithms. To reduce the number of iterations, it is advantageous to use the highest possible radix for the quotient-digit representation. However, the complexity of the quotient-digit selection function increases for higher radices, eliminating the advantage of the reduction in the number of iterations.
One method of reducing the quotient-digit selection process for higher radices is by restricting the range of the divisor. Since the quotient-digit selection is most accurate, and thus, quickest as the divisor approaches 1, it is convenient to restrict the divisor to a range close to 1. This xe2x80x9crange restrictionxe2x80x9d can be done by prescaling the divisor. Moreover, to preserve the value of the quotient, either the dividend has to be prescaled also or the quotient postscaled. Divisor and dividend prescaling is well known in the art and is commonly used for high-radix division. However, while prescaling is useful for simplifying quotient-digit selection in high-radix division units, the clock cycle time for these units still can be large, and the complexity and size of the hardware is great.
Thus, what is needed is division unit and division algorithm that performs high-radix division quickly, but with a simplified hardware design.
According to the present invention, an SRT division unit for performing a novel division algorithm is presented. The novel division algorithm comprises a method for performing division using a radix r. As one skilled in the art will appreciate, the radix r dictates the number of quotient-bits k generated during a single iteration. The relationship between radix r and the number of quotient-bits k generated in a single iteration is r=2k. The number of iterations needed to determine all quotient-digits is N, such that N=n/k, and n is the number of quotient-bits to be generated. For 64-bit floating point notation, n typically is 54.
In accordance with one embodiment of the present invention, the SRT division unit generates a scaling factor M, which comprises scaling sub-factors M1 and M2 according to the relationship M=r*M1+M2. Next, the division unit generates a scaled divisor Y by multiplying a divisor DR by scaling factor M, such that said scaled divisor Y=DR*M=r(DR*M1)+DR*M2. In addition, the division unit generates a first scaled dividend value w[00] and a second scaled dividend value w[0] by muliplying a dividend DD by scaling sub-factor M1 and scaling factor M, respectively. First scaled dividend value w[00]=DD*M1, and second scaled dividend value w[0]=DD*M=r(DD*M1)+DD*M2. Scaled divisor Y and scaled dividend values w[0] and w[00] then are used to generate quotient-digits and additional partial remainders (w[1] to w[N]).
In accordance with this aspect of the invention, the division unit performs a first iteration, which comprises generating a first quotient-digit q[1] using the first scaled dividend value w[00], and generating a partial remainder value w[1] using the first quotient-digit q[1], the scaled divisor Y and a shifted second scaled dividend value rw[0]. The shifted second scaled dividend rw[0] comprises the second scaled dividend w[0] multiplied by the radix r. The partial remainder value w[1] is generated according to the formula w[1]=rw[0]xe2x88x92q[1]*Y.
Next, the division unit performs a second iteration, which comprises generating a second quotient-digit q[2] using the second scaled dividend value w[0] and at least one bit from the first quotient-digit q[1]. In addition, the second iteration comprises generating a partial remainder value w[2] using the second quotient-digit q[2], the scaled divisor Y and a shifted partial remainder rw[1]. The shifted partial remainder rw[1] comprises the partial remainder w[1] multiplied by the radix r. The partial remainder value w[2] is generated according to the formula w[2]=rw[1]xe2x88x92q[2]*Y.
In accordance with the division algorithm of the present invention, the iterations continue until all quotient-digits are generated. As mentioned above, it typically takes N iterations to generate all quotient-digits, where N=n/k and r=2kThus, for a radix 512 division unit, k=9 and it takes 6 iterations to generate all the quotient-digits for a 64-bit floating point value. Thus, the division unit performs subsequent iterations j (j=3 to N) until all N iterations are performed and all quotient-digits are generated. In performing the subsequent iterations, the division unit generates a quotient-digit q[j] for iteration j using a partial remainder value w[jxe2x88x922] from iteration jxe2x88x922 and at least one bit from a quotient-digit q[jxe2x88x921] from iteration jxe2x88x921. In addition, the division unit generates a partial remainder value w[j] using the quotient-digit q[j], the scaled divisor Y and a shifted partial remainder rw[jxe2x88x921]. The shifted partial remainder rw[jxe2x88x921] comprises the partial remainder w[jxe2x88x921] multiplied by the radix r. The partial remainder value w[j] is generated according to the formula w[j]=rw[jxe2x88x921]xe2x88x92q[j]*Y.
As the quotient-digits q[1] to q[N] are being generated, the division unit accumulates the quotient-digits q[1] to q[N] into a final quotient value Q. In addition, if the division is a floating point division, the division unit will calculate a new exponent value by subtracting the exponent value of the divisor from the exponent value of the dividend. Finally, the division unit will perform post correction and rounding functions in accordance with IEEE Std. 754.
A more complete understanding of the present invention may be derived by referring to the detailed description of preferred embodiments and claims when considered in connection with the figures, wherein like reference numbers refer to similar items throughout the figures.