A floating-point division and square root extraction calculation is an operation item in which it requires the longest duration in time in the four arithmetic operations. In order to realize the division, it is roughly classified into a subtraction shift type division, a method for combining tables with operation and a multiplication type division. Even with the square root extraction calculation, the method of combining subtraction, shift, and conditional judgment, and the method of combining table and calculation, the Newton-Raphson method is there.
As the subtraction shift type division, the simplest method consists in a procedure in which a shift instruction, an addition or subtraction instruction (a subtraction instruction or an addition instruction) and a conditional branch instruction are combined to calculate a quotient bit by bit. In addition, a procedure for performing a repetitive processing of shifting, subtraction or addition is also practically mounted and this can perform the processing faster than that of the combination of the aforesaid instructions.
As a procedure for performing a faster division in the subtraction shift type division, there is provided an SRT method. This is a procedure in which an operation for taking a divisor and a dividend from an upper location by several bits and attaining a quotient of several bits on the basis of these bits or an operation for retrieving a table with these bits being applied as index to attain a quotient by several bits is repeated to get a quotient of requisite precision, wherein the number of steps of processing is reduced as compared with that of the procedure for taking a quotient bit by bit as described above. In accordance with a comparison between a subtraction shift type division unit and a multiplication type division unit practically mounted in an LSI, it is advantageous to apply an SRT method having a high radix.
It is also proposed to provide a procedure for combining a table and an operation in order to perform a division. This is a procedure in which the table is stored in ROM, a part of bit-string expressing a mantissa part of divisor is extracted as a bit field, calculation is performed on the basis of a value attained from the table with its content being applied as an index to get a reciprocal, this reciprocal is multiplied by a dividend to realize a division.
As the multiplication type division, there is provided a Newton-Raphson method (hereinafter called as an N/R method). This is a procedure in which an approximate value of reciprocal of given divisor is attained, an iteration calculation indicated byYn+1=Yn·(2−Yn·Rm)[Rm is a mantissa part of divisor]is carried out to attain a reciprocal of predetermined precision, this value is multiplied by a dividend to realize a division.
Pertaining to the square root extraction calculation, the approximate value, Y′0, of the reciprocal's, Y′∞, given square root extraction argument, R′m, is acquired in a similar fashion.
      Y          n      +      1        ′    =            1      2        ⁢                  Y        n        ′            ·              (                  3          -                                    Y              n              ′2                        ·                          R              m              ′                                      )            [R'mis the divisor's mantissa] This shows how the repetitive calculation occurs, and how the fixed accuracy of the square root extraction's reciprocal is produced. This is a method to multiply the product in the argument to realize the square root extraction.
In the case of the subtraction shift type procedure, it is necessary to judge a condition every time one step is processed, so that there remains a problem that an operation requires much time to realize the division under a combination of a shift instruction, an addition and subtraction instruction and a conditional branch instruction by a programming. In addition, since there is a data dependency between the instructions, there is also a problem that it is difficult to improve a processing speed even if a computer having an architecture with an instruction pipe-line configuration is used.
In the case of that a subtraction shift type division is executed by repeating a shifting or an adding or a subtraction through a micro-programming, it can be processed at a higher speed as compared with that of the procedure in which each of the aforesaid instructions is combined to each other. However, this subtraction shift type division shows a problem that it requires an excessive amount of processing time as compared with that of the case in which a single adding instruction, a subtraction instruction and a multiplication instruction (hereinafter these instructions are called in total as an adding-subtraction-multiplication instruction) are applied.
In the case of the SRT method having a high radix, although it shows a high speed operation as compared with the aforesaid simple subtraction shift type division, there still remains a problem that it requires much number of steps. In addition, in order to perform a practical mounting of the SRT method having a high radix, it is necessary to constitute a circuit for performing a subtraction by subtracting a subtractor through a multiplication unit or to constitute a selector for selecting a subtractor to perform a subtraction in response to a temporary resolution attained by a pre-calculated subtractor corresponding to a possible temporary resolution, this method is not necessarily advantageous in view of a time duration required for operation as well as a resource to be used.
In addition, in the case of the subtraction shift type division, since the circuit resource is repeatedly used while one division is being carried out, there remains a problem that the operation circuit is occupied, a subsequent division can not be started and its throughput can not be improved.
The procedure for combining a table with an operation also has a problem that it shows a poor efficiency in actual mounting when it is practically mounted on an LSI due to the fact that even a small-sized unit may require a memory of capacity of several tens to hundreds kilo-bits for performing an operation of simple precision.
In the case of the N/R method, there has been practically mounted a circuit in the prior art that a table of initial values is stored in a memory, a requisite bit field is extracted from the MSB side in the bit-string expressing a mantissa part of the given divisor, the initial value is taken out as an index and applied to an iteration calculation.
In the case of the existing practical mounting, there have been provided many cases using a table of precision of about 8 bits or the like, although, this procedure shows a problem that it is necessary to perform the iteration calculation two times for attaining a reciprocal of simple accuracy and it takes much time when an iteration calculation is performed under a combination of addition, subtraction and multiplication instructions.
In order to attain a precision of 24 bits through one time iteration, it is necessary to provide a precision of minimum 12 bits as an initial value and in the case that this is made directly as a table, it is minimum required to provide a memory having a capacity of 12 bits with a width of 4096 words, i.e. 49,152 bits (6 kilo-bytes).
Actually, it is also necessary to arrange a guard bit for performing an operation and it is required to provide a capacity of several times of it, so that it shows a problem that it occupies a large area to constitute it on the LSI and so its efficiency is poor. In addition, it shows a problem that it can not be mounted in a programmable logic device presently available in the market (hereinafter called as PLD, wherein the PLD herein defined includes some programmable devices such as a CPLD, i.e. Complex Programmable Logic Device or FPGA. i.e. Field Programmable Gate Array or the like).
In addition, in the case that an initial value having a precision of 12 bits is attained and one time iteration calculation is carried out, a recurrence formula expressed byYn+1=Yn·(2−Yn·Rm)[Rm is a mantissa part of a divisor], more practically, an operation expressed byY1=Y0·(2−Y0·Rm)is performed. However, in the case that this operation is realized under a combination of separate operation codes, there occurs a problem that a data dependency is produced between a plurality of instructions, an efficient processing can not be carried out in a processor having an instruction pipe line or an operation pipe line, a conflict using the operation device is produced, it occupies the operation device and so other instructions can not be executed.
There has been provided an operation device having two modes, i.e. one mode in which the conventional operation of 24 bits×24 bits is performed after modifying an existing multiplication unit and the other mode in which a product of an initial value Y0 and a mantissa Rm of a divisor is subtracted from 2.0 while this product is being calculated. The latter mode is performed such that the upper level bit is operated with a complementary number of 2 of the product being applied, so that it can be realized without scarcely adding a circuit. However, also in this case, there remains a problem that the iteration calculation can not be carried out together with other operations and its performance is deteriorated. Further, since the same multiplication unit is used repeatedly for repetition of the N/R method, a subsequent division can not be started until one division operation is completed and its throughput can not be improved.
Additionally, in the case that a reciprocal of high precision (for example, 48 bits of twice of a simple precision are applied herein) is calculated by the N/R method, it is necessary to apply a reciprocal of precision of 24 bits as an initial value to perform one time iteration calculation or to apply a reciprocal of precision of 12 bits to perform iteration calculation twice, and the former case has a problem that a size of the required table becomes large. In turn, in the case of the latter one, there remains a problem that a size of circuit for performing an iteration calculation of second time becomes large and its operation time is extended.
In addition to this problem, there remains a further problem that a large-sized table can not be mounted in the PLD having a high restriction in a circuit resource and a calculation of reciprocal or a division performed by the N/R method can not be practically mounted.
In the case of the mounting of a square root extraction calculation, the same problems exist.