Of the four basic arithmetic operations of addition, subtraction, multiplication and division, circuits have been designed to operate the functions of addition, subtraction and multiplication which can be pipelined at a bit level. To date, however, division operation has been performed by circuits which are either long and time consuming such as by the use of a DSP or required long-latency and complex hardware implementations in VLSI technology. Many circuits have been proposed for implementing division operation in an effective manner. See, for example: O. Spaniol. Computer Arithmetic, Logic and Design. Wiley, 1981; O. Spaniol. Arithmetic in Rechenanlagen, volume 34 of Studienbucher Informatik. Teubner, 1976; V. Carl Hamacher. Computer Organization. McGraw-Hill, 1984; K. Hwang. Computer Arithmetic. John Wiley & Sons, 1979; and N. R. Scott. Computer number systems and arithmetic. Prentice Hall, Englewood Cliffs, 1988.
A fast divide circuit can find use in digital signal processing applications such as speech processing or cryptography. See, for example: T. E. Williams and M. A. Horowitz. A zero-overhead self-timed 160 ns 54-b CMOS divider. IEEE Journal Solid State Circuits, 26(11):1651-61, 1991; A. Vandemeulebroecke, E. Vanzieleghem, T. Denayer, and P. G. A. Jespers. A new carry-free division algorithm and its application to a single-chip 1024-b RSA processor. IEEE Journal Solid State Circuits, 25(3):748-56, 1990; and H. Edamatsu, T. Taniguchi, and S. Kuninobu. A 33 MFLOPS floating point processor using redundant binary representation. In Proceedings IEEE ISSCC'88, pages 152-153, 1988.
The basic algorithm for dividing a binary numerator value (N), by a denominator value (D) to produce a quotient binary value (Q) is given by the following equation: N/D=Q=Q.sub.J . . . Q.sub.0. Where each of the binary numbers N and D are all assumed to be of the same binary length W and Q has J significant bits. In most operations J=W. There are two basic algorithms for solving the above division problem.
The first is termed restoring algorithm. It is as follows:
______________________________________ DO I = W-1 to 0 N = N-D IF (N&gt;0) THEN: Q.sub.I = 1 ELSE: Q.sub.I = 0 N = N + D D = D/2 END ______________________________________
In the restoring algorithm, W identical loops are performed recursively with each loop representing the calculation of one bit of the quotient Q. Thus, in this example, Q has W significant bits. Hence, the algorithm in software form consists of a DO loop. In the first step of the algorithm, the denominator D is subtracted from the numerator N and is stored back in the location of N replacing the numerator value. The resultant numerator value N is tested. If N is positive, the quotient bit is set to one. If the result is negative, however, the quotient bit is set to zero and the subtracted amount of D is restored or added back to the location in the numerator N and stored back into the numerator N. This is the concept of "restoring". Thereafter, the denominator value D is divided by two and stored back in the denominator value location. The process continues for the next quotient bit.
In the non-restoring, the method is equivalent. There, the algorithm is as follows:
______________________________________ Q.sub.W = 1 DO I = W-1 to 0 N = N - (2*Q.sub.I+1 - 1)*D IF (N&gt;0) THEN: Q.sub.I = 1 ELSE: Q.sub.I = 0 D = D/2 END ______________________________________
Again, similar to the restoring algorithm, there are W identical stages for the W significant bits of Q. Initially, the quotient bit Q.sub.w is set to one. In the first step of the operation, D is subtracted from N and stored in N. N is now tested. If it is positive, then the quotient bit is set to one. Otherwise, the quotient bit is set to zero. D is then shifted by one bit to derive D/2. The algorithm continues whereby at the next level, N is calculated based in part upon the quotient bit calculated from the prior level. If the quotient bit from the prior level is one, the resulting calculation is: EQU N=N-D
On the other hand if the prior quotient bit is zero, the resultant calculation is EQU N=N+D.
Thus, the quotient bit calculated from a prior stage determines either D is to be added or subtracted from the current value of N.
As can be seen from the foregoing, the non-restoring algorithm differs from the restoring algorithm in that there is no separate step of N=N+D to restore or add the subtracted value of D back to the original N to restore it. Instead, a quotient bit calculated from a prior stage is used to control whether the subsequent operation is an addition or a subtraction operation.
A basic hardware divide circuit 10 to accomplish the foregoing non-restoring method of binary division is shown FIG. 1 and is known from the prior art. In FIG. 1, the divide circuit 10 comprises a plurality of adder/subtracters 12 (a-c . . . ). The first stage 12a receives the numerator value N and the denominator value D and an initial quotient bit of +1. The result of the operation of the first stage 12a is the term N-D. This value N-D is passed to the second stage 12b. In addition, the denominator value D is shifted by one bit to produce the result D/2. The result of D/2 is also inputted into the second stage 12b. The sign bit of the first stage 12a is used to control the operation of the second stage 12b. The sign bit is used to control the second stage 12b such that it acts either as an adder or as a subtracter. If the sign of the term N-D is positive, then the second stage 12b would perform a subtraction operation. On the other hand, if the sign bit from the operation of the first stage 12a is negative, then the second stage operates as an adder.
The problem with the divide circuit 10 of the prior art is that in order to perform the operation of each stage 12, the operation of a prior stage must be completed in order to obtain the sign bit of the result. The sign bit of the result of a prior stage is then used to decide if the current stage is to be an addition or a subtraction. Since the sign bit result is used, a full carry generation needs to be completed from the LSB (least significant bit) to the MSB (most significant bit). Therefore, pipelining at a bit level is inefficient.