This invention relates to the division of digits using a fast multiplier and quadratic convergence, and more particularly, to an improved divider that employs a complementation, multiplication and addition sequence to achieve reduced execution time.
The division operation has been implemented in the past using conventional algorithms, i.e., algorithms that produce the quotient via successive additions/subtractions. A principal difficulty of such division operations is that their rate of convergence is linear. Each execution of the iterated procedure adds approximately the same number of bits to the quotient digits already determined. See, for example, K. Hwang, "Computer Arithmetic", J. Wiley & Sons, Inc. (1979). Such a family of algorithms produces slow division implementations or, when a faster divider is desired, requires prohibitive hardware.
A divide algorithm based on quadratic convergence has been designed for the System/360 Model 91 computer from International Business Machines Corporation, the assignee of the present application. This algorithm, which shall be referred to herein as the "IBM" method, has resulted in faster divide hardware, especially for technologies that allow the design of fast multipliers. See S.F. Anderson, et al, "The IBM System/360 Model 91: Floating Point Execution Unit", IBM Journal pp. 34-53 (January 1967). The overhead needed to implement such division with a fast multiplier is small, and the gain in performance has been comparable to traditional large-scale multipliers. Thus using quadratic algorithms to design dividers, both hardware and execution time can be saved when compared to the traditional algorithms that employ addition/subtraction. The actual savings are of large magnitudes.
The IBM quadratic convergence division algorithm may be developed by first considering the division operation: EQU Q=N/D
with Q being the quotient, N the dividend, and D the divisor. Assume that the quotient Q can be generated for the division, i.e., N &lt;D and D .noteq.0. The division operation can be written as: ##EQU1## If R.sub.k is found, for 0 .ltoreq.k .ltoreq.n, such that the denominator DR.sub.0 R.sub.1. . . R.sub.n converges to 1, then the quotient, Q, is equal to: EQU Q=NR.sub.0 R.sub.1. . . R.sub.n
Let N and D be two positive fractions and assume that N and D are normalized. It can be proven that the denominator DR.sub.0 R.sub.1. . . R.sub.n approaches 1 if: EQU R.sub.0 =1+.delta.for k=0 EQU R.sub.k =1+.delta.2**k=2-Dk-1 for k&gt;0 EQU D.sub.k =1-.delta.2**(k+1)=DR.sub.0 R.sub.1. . . R.sub.k =D.sub.k-1 R.sub.k
The convention "**" will be used throughout this specification to indicate double exponentiation. Thus, by way of example, the expression "x.sup.2 **y" shall be understood to signify "x.sup.2 " to the "y" power, and so on.
It can further be proven with substitution of R.sub.0, R.sub.k and D.sub.k that the quotient can be computed by: EQU Q=N(1+.delta.)(1+.delta..sup.2)(1+.delta..sup.4) . . . (1+.sup..delta.2**n)
For example, consider a 56 bit fraction. Given that D is bit normalized, i.e., of the form 0.1***. . . *, with * representing either 0 or 1, it's value is between 1/2.ltoreq.D&lt; 1 and D can be written as: EQU D=1-.delta., where 0.ltoreq..delta..ltoreq.1/2
The IBM quadratic convergence algorithm states the following:
1. For the first iteration: EQU R.sub.O =1+.delta.=2-D
As shown in Appendix A hereto, R.sub.0 is obtained by two's complementation of the divisor D.
The value D.sub.0 is determined by multiplying D by its two's complement R.sub.0 to obtain: EQU D.sub.0 =DR.sub.0 =(1-.delta.)(1+.delta.)=1-.delta..sup.2
Since D is bit normalized, and .delta..ltoreq.1/2, it can be stated that .delta..sup.2 1/4 and DR.sub.0 .ltoreq.3/4, which implies that DR.sub.O is of the form 0.11** . . .
2. For the second iteration: EQU R.sub.1 =1+.delta..sup.2 =2-D.sub.0
Again, R.sub.1 is found by two's complementation of the value D.sub.0 calculated above.
The value R.sub.1 is multiplied by its two's complement D.sub.O to obtain the next iteration of D.sub.k : EQU D.sub.1 =DR.sub.0 R.sub.1 =D.sub.0 R.sub.1 =(1-.delta..sup.2)(1+.delta..sup.2)=1-.delta..sup.4
The implication is that DR.sub.0 R.sub.1 is of the form 0.1111*** . . . *.
Successive iterations are similarly carried out. Each iteration will double the leading ones and DR.sub.0 R.sub.1. . . R.sub.n will converge to 0.111...11, where there are 56 ones following the binary point (i.e., it will converge to 1).
Given that the first iteration produces two leading 1's, the second iteration four leading 1's, etc . . . , to converge 56 bits, six iterations will be needed. That is six iterations will produce, EQU DR.sub.0 R.sub.1 R.sub.2 R.sub.3 R.sub.4 R.sub.5 =0.111...11-1
where there are 56 ones following the binary point. To produce the quotient: ##EQU2## it might appear that six more multiplications will be needed in addition to the six multiplications needed to converge the denominator to 1.
However, it is noted that: EQU R.sub.k =1+.delta..sup.2**k and D.sub.k-1 =1-.sup..delta.2**k, k.ltoreq.1 EQU R.sub.k =2-D.sub.k-1
The implication of the previous statement is that while 6 multiplications are needed to converge the denominator to 1, the last multiplication will produce D.sub.k (with k=5) which is not required for the quotient. Thus, such a multiplication need not be performed. Referring to FIG. 1, this means that step 12 of the operation may be eliminated because only DR.sub.0 R.sub.1 R.sub.2 R.sub.3 R.sub.4, and not DR.sub.0 R.sub.1 R.sub.2 R.sub.3 R.sub.4 R.sub.5, is required to compute the quotient Q. Consequently, in order to produce the quotient, eleven multiplications are required with two's complementation to create the desired result. However, as will be discussed in the "Comparison" section below, because of date dependency interlocks, eleven multiplications may not produce a faster divider.
It will also be appreciated that the cycle time of the divider is related to the speed of the multiplier. It is assumed that the multiplier is designed in a parallel fashion for high speed execution.
Notwithstanding the significant reduction in execution duration achievable with the IBM divider as compared to dividers employing linear convergence methods, additional reductions in execution cycle requirements would be desirable. It would be further advantageous to provide a divider utilizing less overhead hardware than previous efforts. For example, obviation of the two's complementation hardware, required for implementation of the prior art IBM divider would offer advantages of hardware simplification and cost reduction.