1. Field of the Invention
This invention relates to a hardware divider used in an integrated floating processor unit (FPU), and more particularly to a high-radix divider.
2. Description of the Related Art
Recently, as one-chip microcomputers with highly sophisticated functions have been developed, the functions of FPUs are increasingly enhanced and the multiplication can be effected with doubled precision by 1 to 3 clocks by means of a hardware multiplier. In a case wherein a hardware divider whose operation is effected in parallel with the other operations is used, an algorithm of restoring, nonrestoring, SRT (Sweeney Robertson Tocher), high radix division or the like based on a repetition operation system and a convergence type division algorithm based on the Newton-Raphson method may be considered to be used.
FIG. 1 schematically shows a conventional hardware divider utilizing the nonrestoring algorithm based on the repetitive operation system. A partial remainder register 41 stores a dividend in the initial period of the operation and then stores a partial remainder R.sup.(j) in the remaining period. A partial remainder shifter 42 shifts the output of the partial remainder register 41 to the left by the radix. A divisor register 43 stores a divisor D. An adder/subtracter 44 effects the addition of or subtraction between data input from the partial remainder shifter 42 and data input from the divisor register 43 to output a partial remainder R.sup.(j+1). A control counter 45 controls the operation timing of the above respective circuits.
In general, the conventional hardware divider utilizing the nonrestoring algorithm based on the repetitive operation system effects the division according to the following convergence equation when the radix is set to 2, the partial remainder in the preceding cycle is expressed by R.sup.(j), the partial remainder in the current cycle is expressed by R.sup.(j+1), the partial quotient is expressed by Q and the divisor is expressed by D. EQU R.sup.(j+1) =2R.sup.(j) -QD
(when j in the partial remainder R.sup.(j) is 0, a dividend is R.sup.(O).
The above division system has an advantage that each time the operation for the operation loop is effected by one cycle in response to one clock, one bit of the partial quotient Q can be determined and the quotient Q can be determined based on the sign of the partial remainder 2R.sup.(j) in the preceding cycle when the partial quotient Q is {-1, 1}.
However, in the operation with doubled precision effected by the FPU and defined by the IEEE standard, a mantissa portion has as many as 54 bits (a mantissa of 53 bits and a sign of 1 bit) and at least 54 clocks are necessary for effecting the division. Therefore, the latency (latent time from the instruction input to the operation output) of the operation for requiring the result of division becomes long, thereby significantly increasing the processing time. Further, in order to effectively utilize the parallel operability of the divider using the above division system, a high technique is required for the arrangement of operation instructions and it is almost impossible to use the divider to the full extent if the machine language is created by use of the compiler.
The convergence type division algorithm based on the Newton-Raphson method may be effective at an operation speed corresponding to the latency. However, since the algorithm requires two multiplications and one addition in the convergence equation and the initial value approximation, a read only memory having a certain amount of memory capacity is required. Further, in a general computation, since the number of operations of division is smaller than that of multiplication, the investment effect of the hardware divider using the convergence type division algorithm is small and the performance thereof becomes low since the multiplication and addition/subtraction cannot be effected while the division is being effected.
As described above, since the conventional hardware divider utilizing the nonrestoring algorithm based on the repetitive operation system has a radix of 2, at least 54 clocks are necessary when the division with doubled precision is effected by the FPU. For this reason, the latency of the operation requiring the result of the division becomes long and the processing time becomes significantly long. Further, it is difficult to effectively utilize the parallel operability of the divider.