The present invention relates generally to integrated circuits, and more particularly to a high-speed circuit in an arithmetic unit which executes the addition, multiplication or division between two numerical values.
A floating-point arithmetic unit in prior art will be explained with reference to FIG. 64 of the accompanying drawings.
The floating-point arithmetic unit exchanges data with a memory by means of a register 6401 and a bus 6423. Data read out of the register 6401 are supplied to an adder 6402, a multiplier 6403 and a divider 6404 through source buses 6411.about.6414. The respective arithmetic elements execute the processes of addition/subtraction, multiplication and division by the use of two supplied numerical values, and they write the results of the processes into the register 6401 through result buses 6421 and 6422. This example has a construction in which the latter half of the multiplier 6403 is shared for both the execution of the multiplication and that of the division. Alternatively, however, the arithmetic unit can have a construction in which the latter half of a multiplier is doubled, whereby the multiplier and divider operate independently of each other. For rendering the floating-point arithmetic unit high in performance, it is necessary to heighten the operating speeds of the adder, multiplier and divider, respectively.
Next, the multiplier 6403 will be explained.
A prior-art method of multiplying floating-point numbers will be explained by taking as an example an IEEE double-precision standard format (stipulated by The Institute of Electrical and Electronic Engineers) illustrated in FIGS. 30(a).about.30(c). As shown in FIG. 30(a), a floating-point number is composed of a sign, an exponent and a mantissa. In the IEEE double-precision standard format, 1 bit (64th bit) is allocated as a sign part, 11 bits (53rd.about.63rd bits) as an exponent part, and 52 bits (1st.about.52nd bits) as a mantissa part in a word length of 64 bits. Each of the bits is represented by a value "1" or "0", and the whole number is handled as a binary number. The mantissa lies within a range of at least 1 and less than 2, and the high-order digit or MSB (most significant bit) thereof is 1 at any time. Therefore, only a fraction part or decimal places with 1 of the MSB excluded are used for the representation of the mantissa part. The exponent of the exponent part has a bias value added to the actual value thereof beforehand so that it can be handled as a positive number.
As shown in FIG. 30(b), a multiplicand consists of a sign #1 (al), an exponent #1 (a2) and a mantissa #1 (a3), while a multiplier factor consists of a sign #2 (b1), an exponent #2 (b2) and a mantissa #2 (b3).
The multiplication between the floating-point numbers includes the operations of the mantissa parts and the exponent parts. The operation of the mantissa parts will be explained first. As shown in FIG. 30(c), a partial product #1 (c1).about.a partial product #53 (c53) are generated from the mantissas #1 (a3) and #2 (b3) in correspondence with the respective bits of these mantissas in a bit length of 53 bits containing the MSB value "1" which is not represented. The partial products #1 (c1).about.#53 (c53) are added by a partial product adder being a carry save adder in which full adders are combined in an array. Thus, an intermediate product #1 (d1) and an intermediate product #2 (d2) are obtained.
The intermediate products #1 (d1) and #2 (d2) are subsequently added, thereby obtaining a product e which is 105 or 106 bits long and in which the position of a decimal point lies at the first or second place as reckoned from the MSB of this product. (In this regard, the position of the decimal point is determined depending upon the magnitudes of the multiplier factor and the multiplicand. Corresponding hardware, however, is constructed as 106 bits.)
Subsequently, the decimal position, integral part and bit length of the obtained product are conformed to the mantissa format of the IEEE double-precision standard format. More specifically, the position of the decimal point is shifted so that the integral part may have one digit and become the value "1" (normalization). Since a rounding position is determined by the normalization, a rounding process is executed. The 52nd decimal position is set as the LSB (least significant bit) in the rounding process because the mantissa of the format requires 52 bits below the decimal point. In a case where the integral part has become 2 digits on account of a carry generated by the addition of the rounding process, the normalizing process is executed again.
Next, the operation of the exponent parts will be explained. The exponent parts #1 (a2) and #2 (b2) are first added. Since each of the exponent parts has a bias value, double the bias value is contained in the result of the first addition between the exponent parts. In order to obtain a correct exponent, a correction needs to be made by subtracting the additional bias magnitude from the result of the addition. Further, in order to handle the mantissa with its decimal point shifted one place, "1" is added to the corrected exponent in the operation of the exponent parts here.
Incidentally, the signs of the multiplicand and the multiplier factor are so processed that the sign part of the product is set at "0" when the sign parts #1 (a1) and #2 (b1) are equal, whereas it is set at "1" when they are not.
Now, the adder 6402 will be explained.
A form for representing a floating-point number in a computer is, for example, a standard format stipulated by IEEE-754 as shown in FIG. 44. The prior-art unit which adds/subtracts two floating-point numbers represented by such a format, is disclosed in the official gazette of Japanese Patent Application Laid-open No. 232723/1990.
The known unit is constructed as illustrated in FIG. 45. An aligner circuit 2A is supplied with the two floating-point numbers to be added/subtracted, namely, operands #1 and #2. It shifts the mantissa part of one of the operands toward a low-order position so as to equalize the exponent parts of both the operands. Output data 108A and 10A delivered from the aligner circuit 2A are input to an adder-subtracter circuit 24A. The adder-subtracter circuit 24A adds or subtracts the aligned mantissa parts in accordance with the distinction between an addition type instruction and a subtraction type instruction, the signs of the operands, the relation between the magnitudes of the operands, etc. An output 112A from the adder-subtracter circuit 24A is input to a normalizer circuit 26A.
The normalizer circuit 26A seeks the MSB of the mantissa part of the added/subtracted result, that is, the digit at which "1" appears first as viewed from the high-order digit of the format. Then, it shifts the mantissa part so that a decimal point may come just to the right side of the sought digit. Simultaneously, it corrects the exponent part 106A delivered from the aligner circuit 2A, in accordance with the number of shift places produced in the above shift operation. Outputs 134A and 138A from the normalizer circuit 26A are input to a rounding circuit 28A.
In a case where the number of bits of the mantissa part of data after the normalization exceeds a limit which can be represented in the predetermined format, the rounding circuit 28A shortens the bit length to a representable number of bits. Concretely, those lower-order bits of the mantissa part which cannot be represented are subjected to a round-up process (requiring the addition of +1) or a round-down process (not requiring the addition of +1) in accordance with the value of the bits and a predetermined rounding mode.
Now, a high radix division control method, which is one method of realizing the divider 6404, and a high radix divider will be explained. More concretely, they serve to quickly execute an iterative type high radix division in which a quotient as a binary number is calculated every n bits from higher-order digits on the basis of a dividend and a divisor represented in terms of binary numbers.
A high radix division algorithm will be first explained. Incidentally, a "high radix operation" signifies an operation which is executed in plural-bit units.
In a case where the dividend N, divisor D and radix r meet the condition of N 21 r.multidot.D, the first calculation of the division proceeds as stated below.
The quotient digit q1 of the first calculation is evaluated from the following formulae of relations: ##EQU1## The quotient Q1 and partial remainder P1 of the first calculation are respectively given by Eqs. 2: ##EQU2##
The second calculation, et seq. proceed as stated below.
The quotient digit qj+1 of the (j+1)th calculation is evaluated from the following formulae of relations: ##EQU3## The quotient Qj+1 and partial remainder Pj+1 of the (j+1)th calculation are respectively given by Eqs. 4 (4A and 4B): ##EQU4##
When the above calculations are iterated n times till the attainment of a required precision, the final quotient Q and remainder R are respectively obtained as indicated by Eqs. 5: ##EQU5##
As the betterment of the above algorithm for a higher operating speed, a high radix SRT division algorithm is known (SRT: Sweeney, Robertson and Tocher). This algorithm can shorten a calculating time period because it utilizes the redundancy of data, thereby calculating each quotient digit at a rough precision of every several higher-order bits without using the exact values of a partial remainder and a divisor. A calculating method which employs a quaternary SRT division algorithm, will be explained below.
It is assumed that a dividend N and a divisor D meet the condition of N&lt;(8/3).multidot.D. The first calculation of the division proceeds as stated below.
The quotient digit q1 of the first calculation is evaluated from the following formulae of relations: ##EQU6## The quotient Q1 and partial remainder P1 of the first calculation are respectively given by Eqs. 7: ##EQU7##
The second calculation, et seq. proceed as stated below.
The quotient digit qj+1 of the (j+1)th calculation is evaluated from the following formulae of relations: ##EQU8## The quotient Qj+1 and partial remainder Pj+1 of the (j+1)th calculation are respectively given by Eqs. 9 (9A and 9B): ##EQU9##
When the above calculations are iterated n times till the attainment of a required precision, the final quotient Q and remainder R are respectively obtained as indicated by Eqs. 10: ##EQU10##
Since the high radix SRT division algorithm can afford accurate results at a comparatively high speed, it is utilized in an LSI (large-scale integrated circuit) for floating-point calculations or in a microprocessor having a built-in floating-point arithmetic unit.
The division system of a floating-point coprocessor "R3010" manufactured by MIPS Computer Company is discussed in "IEEE MICRO", June 1988, page 57. In this system, the quaternary SRT algorithm is adopted. After the 9 higher-order bits of a partial remainder in a carry save form have been subjected to a carry propagation addition, the result is input to a quotient digit calculation circuit together with the 9 higher-order bits of a divisor, thereby obtaining the 2 bits of a quotient.
A similar operating system is adopted in a floating-point coprocessor which is discussed in "IEEE DIGEST OF TECHNICAL PAPERS", 1989, page 52.