1. Field of the Invention
The present invention generally relates to multipliers for carrying out multiplication of binary numbers, and more particularly, to a multiplier configured of electronic circuitry including a semiconductor device.
2. Description of the Background Art
When multi-hit binary numbers X and Y are multiplied, a Booth algorithm is often used to reduce the number of partial products to be produced, and to carry out multiplication efficiently. The Booth algorithm is a method for multiplying negative numbers in a complement notation without correction.
In the Booth algorithm, data bits of a multiplier Y are divided into groups. FIG. 30 shows an example of group division of multiplier Y. In FIG. 30, group division with respect to the second order Booth algorithm is shown. Each group includes three data bits. One is shared between adjacent groups (shown by hatching in FIG. 30 ). One partial product is generated by one group. The number of partial products is approximately one half in the case of the second order Booth algorithm. In general, when one group includes m bits, it is called the (m-1)-th order Booth algorithm, and the number of partial products to be generated is approximately 1/(m-1). Description will be given of the Booth algorithm with reference to equations.
Multiplier Y is represented by the following equation (1) in a two's complement format. ##EQU1##
In the equation (1), Yn is a sign bit, which indicates whether multiplier Y is positive or negative. A data bit yi is a binary value (1 or 0). 2.sup.j attached to each bit is binary weighting of each data bit.
In the equation (1), when n is an even number and y0=0, multiplier Y is developed as shown in the following equation (2). ##EQU2## wherein y=0, and n is an even number.
A product X.multidot.Y of multiplier Y and a multiplicand X is given by the sum of partial products: EQU (y.sub.2i +y.sub.2i+1 -2.multidot.y.sub.28+2).multidot.2.sup.2i .multidot.X.
Therefore, when values of three bits y.sub.2i, y.sub.2i+2, and y.sub.2i+2 are given, the operation required for generation of the partial products is determined. The relationship between the bit values of the three bits y.sub.2i, y.sub.2i+1 and y.sub.2i+2, and the operation to be carried out is shown in FIG. 31.
As is clearly seen from FIG. 31, the operations to be carried out in the second order Booth algorithm are 0, .+-.X and .+-.2X. Two times multiplicand X, that is, 2.multidot.X, is generated by a shift circuit shifting the multiplicand by one bit in the more significant bit direction. Since multiplicand X is in a two's complement format, the "-" operation is implemented by inverting each bit value and adding one to the least significant bit. Therefore, if the operation to be carried out is determined by the values of three bits of multiplier Y, it is possible to carry out the product operation at a high speed.
FIG. 32 is a diagram showing the entire configuration of a conventional multiplier disclosed in, for example, Japanese Patent Laying-Open No. 3-177922. The multiplier carries out multiplication of 8-bit multiplier Y represented in a two's complement by 8-bit multiplicand X represented in a two's complement according to the second order Booth algorithm.
Referring to FIG. 32, the multiplier includes an encoding circuit 500 generating a control signal designating the operation to be carried out for each group according to bits of each bit group of multiplier Y, and a partial product generating circuit 502 responsive to the control signal from encoding circuit 500 to carry out the designated operation for multiplicand X to generate a partial product group.
Encoding circuit 500 includes Booth encoders 1, 2, 3 and 4 provided for every group of three bits of multiplicand X. The ground potential is applied to the input of Booth encoder 1, since the least significant bit is set to 0 in the second order Booth algorithm.
Partial product generating circuit 502 includes shifter/inverter circuits 9, 10, 11 and 12 including shifters and inverters responsive to control signals applied onto signal lines (bus configuration including a plurality of signal lines) from encoding circuit 500 for respectively carrying out operations for multiplicand X to generate partial products.
The multiplier further includes Wallace tree adding circuit 17 having a Wallace tree type adder arrangement adding four partial products (applied onto buses 13, 14, 5 and 16) generated from partial product generating circuit 502 to generate an intermediate addition result, and a final adder 19 adding a pair of 16-bit data (applied onto a bus 18) showing the intermediate addition result from Wallace tree adding circuit 17. A 16-bit (z0-z15) binary number Z represented in a two's complement showing the product X.multidot.Y of multiplier Y and multiplicand X is provided from final adder 19. The operation will now be described.
Multiplier Y is applied to encoding circuit 500, and multiplicand X is applied to partial product generating circuit 502. Each of Booth encoders 1 to 4 included in encoding circuit 500 generates a control signal for designating the operation to be carried out according to the relationship shown in FIG. 31 from given adjacent three bits y.sub.y+1 y.sub.j, and y.sub.j-1 (where j=1to 6).
Each of shifter/inverter circuits 9 to 12 provided corresponding to Booth encoders 1 to 4 has a configuration as shown in FIG. 33, and generates a partial product from multiplicand X based on control signals 5 to 8 given.
FIG. 33A shows shifter/inverter circuit 9. Other shifter/inverter circuits 10 to 12 have the same configuration. Shifter/inverter circuit 9 receives control signal 5 applied from corresponding Booth encoder 1. Control signal 5 includes control signals .phi.0, .phi.X, .phi.2X, and .phi.IV. A bus and a signal thereon are denoted by the same reference characters.
The control signal .phi.0 designates the "0 operation". The control signal .phi.X designates the "X operation". The control signal .phi.2X designates the "2X operation". The control signal .phi.IV designates the "-operation". The Booth encoder selectively brings the control signals .phi.X, .phi.2X, and .phi.IV into an active state according to the result of the encoding.
When the control signal .phi.0 is applied, shifter/inverter circuit 9 sets all bits of multiplicand X to "0". When the control signal .phi.X is applied, shifter/inverter circuit 9 does not carry out the operation for multiplicand X. When the control signal .phi.2X is applied, shifter/inverter circuit 9 shifts multiplicand X by one bit to the more significant bit side to carry out the operation of 2.multidot.X. When the control signal .phi.IV is brought into an active state, shifter/inverter circuit 9 inverts each bit value of the intermediate data generated based on the control signal .phi.X or .phi.2X, and adds 1 to the least significant bit. As a result, the result subjected to any operation of 0, .+-.X, and .+-.2X in response to the control signals .phi.0, .phi.X, .phi.2X, and .phi.IV is provided from shifter/inverter circuit 9 as a partial product.
FIG. 33B is a diagram showing the functional configuration of shifter/inverter circuit 9. In FIG. 33B, the shifter/inverter circuit includes a 0 generator 610 setting all bit values of multiplicand X to 0, an X generator 612 passing multiplicand X without carrying out any operation, a 2X generator 614 shifting each bit of multiplicand X by one bit in the more significant bit direction to generate 2.multidot.X, a logic gate 618 for inverting or non-inverting each bit value of the output of X generator 612 and a logic gate 620 for inverting or non-inverting each bit value of the output of 2X generator 614 in response to the control signal .phi.IV.
Logic gate 618 includes an EXOR gate provided for each bit of the output of X generator 612. Logic gate 620 includes an EXOR circuit provided corresponding to each bit of the output of 2X generator 614. The EXOR circuit serves as an inverter circuit when the control signal .phi.IV is brought into an active state of an "H" level, and serves as a buffer circuit when the control signal .phi.IV is at an "L" level.
Shifter/inverter circuit 9 further includes a 1 generator 616 generating a bit value "1" in response to the control signal .phi.IV, and a selection gate 622 passing the outputs of 0 generator 610, logic gate 618, and logic gate 620 therethrough in response to the control signals .phi.0, .phi.X, and .phi.2X, respectively. Selection gate 622 includes a transfer gate 621 passing the output of 0 generator 610 therethrough in response to the control signal .phi.0, a transfer gate 623 passing the output of logic gate 618 therethrough in response to the control signal .phi.X, and a transfer gate 625 passing the output of logic gate 620 therethrough in response to the control signal .phi.2X.
1 generator 616 generates the bit value "1" when the control signal .phi.IV is brought into an active state. More specifically, when the "-" operation is carried out, the bit value "1" is generated, which becomes a correction bit at the time of the sign inverting operation. Since the correction bit attains "1" when the "-" operation, that is, the sign inverting operation is carried out, the correction bit also serves as an indication bit indicating coincidence/non-coincidence of signs between multiplicand X and partial products generated therefrom.
Four partial products 13 to 16 generated in partial product generating circuit 502 (cf. FIG. 1) are added in Wallace tree adding circuit 17 having the configuration shown in FIG. 34.
Referring to FIG. 34, Wallace tree adding circuit 17 includes two-stage full adding circuits 17a and 17b. Although full adding circuits 17a and 17b respectively include 3-input/2-output full adders provided corresponding to each digit of a partial product, full adders 21, 22, 23, 24, 25 and 26 provided corresponding to the (i-1)-th digit, the i-th digit, and the (i+1)-th digit of the partial product are shown. Each of full adders 21 to 26 adds three inputs to generate a carry output CO and a sum output S showing the addition result. Full adder circuit 17a of the first stage adds three partial products 13 to 15, and full adder circuit 17b of the second stage adds the output of full adder circuit 17a of the first stage and partial product 16. In FIG. 34, respective bit values of partial products 13 to 15 are shown by aj, bj and cj, and the bit value of fourth partial product 16 is shown by dj.
In the Wallace tree configuration, the carry output of the full adder of the first stage is applied to a first input of the full adder at a one bit more significant digit of the full adding circuit of the second stage, and the sum output is applied to the input of the full adder at the same digit of the full adding circuit of the second stage. For example, full adder 23 provided in the i-th digit adds three bits ai, bi, and ci, provides the carry output CO to the input of full adder 26 of the next stage, and its sum output to the input of full adder 24.
Full adder 24 included in full adding circuit 17b of the second stage receives the sum output S of full adder 23 provided in the i-th digit, the bit di of partial product 16, the carry output CO of full adder 21 in the (i-1)-th digit of full adding circuit 17b of the first stage, to generate the carry output CO.sub.i+1 and the sum output Si. The operation is carried out with respect to each digit of the partial product to reduce data of four bits ai, bi, ci, and di to two bits of COi and Si. In other words, four partial products 13 to 16 are reduced to two 16-bit data (S and CO).
Two 16-bit data S and CO of the intermediate stage generated as described above are added by a final adder 19 having the configuration shown in FIG. 35, to generate a 16-bit (z15-z0) data Z indicating the product X.multidot.Y.
Referring to FIG. 35, final adder 19 includes a carry lookahead unit 19a predetermining whether a carry is produced, and an adding portion 19b adding the carry output of carry look head unit 19a and two 16-bit data S and CO.
Carry lookahead unit 19a includes a first carry lookahead circuit (CLA) 37 provided for least significant four bits CO0 to CO3 and SO to S3 of two 16-bit data CO and S, a second carry lookahead circuit 38 provided for intermediate four bits CO4 to CO7 and S4 to S7 of two data CO and S, and a third carry lookahead circuit 39 provided for most significant four bits CO8 to CO11 and S8 to S11 of two data CO and S. A carry generation signal G0 is generated from first carry lookahead circuit 37. A carry generation signal G and a carry propagation signal P are generated from second and third carry lookahead circuits 38 and 39.
Carry lookahead unit 19a further includes a fourth carry lookahead circuit 40 receiving outputs of first and second carry lookahead circuits 37 and 38 to generate a carry CC1, and a fifth carry lookahead circuit 41 receiving outputs of first, second and third carry lookahead circuits 37, 38 and 39 to generate a carry CC.sub.2.
The specific configuration and operations of a carry lookahead circuit are described in, for example, "PRINCIPLES OF CMOS VLSI DESIGN", N. H. E. Weste, et al., published by Addison-Wesley, Inc., 1985, pp 320 and 321. However, the operational principle will now be described briefly.
In general, the i-th bit carry C.sub.i is represented by the following equations (3) to (5) using two inputs S.sub.i and CO.sub.i, and a carry C.sub.i-1 at a less significant bit (that is, the (i-1)-th digit). ##EQU3## where EQU g.sub.i =S.sub.i .multidot.CO.sub.i ( 4) EQU P.sub.i =S.sub.i .sym.CO.sub.i ( 5
In the configuration shown in FIG. 35, the 16-bit data is divided by four bits into four sets. A control signal for finding a carry is generated in each set. By using the generated control signal, carries of up to the fourth bit, the eighth bit, and the twelfth bit are found in a lookahead manner. First carry lookahead circuit 37 generates carries regarding least significant four bits. The carry CC0 generated by carry lookahead circuit 37 is given by the following equation (6) based on the equation (3). ##EQU4## where EQU G0=g3+p3.multidot.g2+p3.multidot.p2.multidot.g1+p3.multidot.p2.multidot.p1. multidot.g0 (7) EQU P0=p3.multidot.p2.multidot.p.multidot.p0 (8)
In the equations, C.sub.-1, a carry input for the least significant bit, is 0 here. Therefore,
CCO=G0 (9)
More specifically, first carry lookahead circuit 37 finds the carry output CC0 of the fourth bit according to the equation (7). Second and third carry lookahead circuits 38 and 39 have the same configuration. According to the above-described equations (4), (5), (7) and (8), second carry lookahead circuit 38 generates for inputs S4 to S7 and CO4 to CO7 EQU G1=g7+p7.multidot.g6+p7.multidot.p6.multidot.G5+p7.multidot.p6.multidot.p5. multidot.G4 (10) EQU P1=p7.multidot.p6.multidot.p5.multidot.p4 (11)
Similarly, third carry lookahead circuit 39 generates, for inputs S8 to S11 and CO8 to CO11, EQU G2=g11+p11.multidot.g10+p11.multidot.p10.multidot.G9 +p11.multidot.p10.multidot.p9.multidot.G8 (12) EQU P2=p11.multidot.p10.multidot.p9.multidot.p8 (13)
Fourth carry lookahead circuit 40 generates the carry output CC1 of the eighth bit from the outputs CC0, G1 and P1 of first and second carry lookahead circuits 37 and 38 according to the following equation (14). EQU CC1=G1+P1.multidot.CC0 (14)
Fifth carry lookahead circuit 41 generates the carry output CC2 of the twelfth bit from the outputs CC0, G1, P1, G2 and P2 of first to third carry lookahead circuits 37 to 39 according to the following equation (15). EQU CC2=G2+P2.multidot.G1+P2.multidot.P1.multidot.CC0 (15)
According to such configuration, it is possible to find the carry outputs CC0, CC1 and CC2 for every four bits in parallel.
Adding portion 19b includes four ripple adders (RA) 42 to 45 provided for respective groups of four bits of two inputs S and CO. Ripple adder 42 adds bits S0 to S3 and CO0 to CO3 to generate 4-bit data z0 to z3. Ripple adder 43 receives the carry CC0, and adds bits S4 to S7 to CO4 to CO7 to generate 4-bit data z4 to z7. Ripple adder 44 receives the carry CC1, and adds bits S8 to S11 and CO8 to CO11 to generate 4-bit data z8 to z11. Ripple adder 45 receives the carry CC2, and adds bits S12 to S15 and C012 to C015 to generate 4-bit data z12 to z15.
Each of ripple adders 42 to 45 has the same configuration. As is shown in FIG. 36, each of ripple adders 42 to 45 includes four full adders 650-0 to 650-3. In the ripple adder, a carry output C.sub.out of the full adder on the side of less significant bits is applied to a carry input C on the side of more significant bit. In other words, in the ripple adder, the carry output Cout of the full adder provided at the least significant bit is sequentially transmitted as a carry input to more significant bits. The carry output of full adder 650-3 provided at the most significant bit position has already been found by carry lookahead portion 19a. Carry propagation can amount to three stages of full adders, whereby delay with carry propagation is reduced and the operation speed is increased.
By ripple adders 42 to 45 as shown in FIG. 35, respective data bits z15 to z0 of the product X.multidot.Y are provided in parallel.
One example of the configuration of the full adder included in Wallace tree adding circuit 17 and final adder 19 is shown in FIG. 37. Referring to FIG. 37, the full adder includes an AND circuit 27 receiving inputs A and B, an AND circuit 28 receiving inputs B and C, an AND circuit 29 receiving inputs A and C, a 3-input OR circuit 30 receiving outputs of AND circuits 27 to 29, an XOR circuit 31 receiving inputs A and B, and an XOR circuit 32 receiving the output of XOR circuit 31 and the input C. OR circuit 30 provides the carry Cout, and XOR circuit 32 generates the sum output Sout. The full adder generates the carry and the sum according to the following equation. EQU Cout=A.multidot.B+B.multidot.C+C.multidot.A EQU Sout=A.sym.B.sym.C
The XOR circuit carries out the operation of V.multidot./W+/V.multidot.W for two inputs V and W. More specifically, the XOR circuit carries out the AND operation and the subsequent OR operation equivalently, and the delay time thereof is larger than those of the AND circuit and the OR circuit. Therefore, the delay time of the full adder of the configuration shown in FIG. 37 is determined by the delay time of two stages of XOR circuits. Wallace tree adding circuit has each bit configured of two-staged full adders. Therefore, the delay time in the Wallace tree adding circuit is the delay time of four-stages of XOR circuits.
FIG. 38 is a diagram showing a general configuration of AND circuit. In FIG. 38, the full adder includes p channel MOS transistors Tr1 and Tr2 provided in parallel between a power supply potential Vcc supply node and an internal node ND and receiving inputs B and A at their gates, n channel MOS transistors Tr3 and Tr4 connected in series between the node ND and the ground potential and receiving inputs A and B at their gates, and an inverter circuit configured of complementary-connected p channel MOS transistors Tr5 and Tr6 for inverting the potential on the node ND. More specifically, the AND circuit includes six transistors. The 3-input OR circuit requires six transistors in total as output charging transistors and output discharging transistors. The XOR circuits requires at least two AND circuits and one OR circuit. More specifically, the full adder requires 30 or more transistors when it is configured of MOS transistors.
60 or more transistors in total are required for one bit in the Wallace tree adding circuit because of the two-staged full adders. Therefore, the circuit scale became large, and the delay in the circuit became large, which hampered high speed multiplication.
In particular, multiplication is an important operation element in a computer system. The speed of multiplication determines the operation speed of the entire system in scientific and technological computation, image processing or the like. Therefore, a high speed operation is required for the multiplier. The requirement for the high speed operation is further increased. However, since the delay in the full adder in the structure of the conventional Wallace tree adding circuit and the final adding circuit was large, it was not possible to implement intended high speed multiplication.
In the case of the Wallace tree adding circuit, a 3-input/2-output full adder is used. In this case, as shown in FIG. 34, a signal line transmitting each bit value of a partial product extends over the full adder. When the numbers of bits of a multiplier and a multiplicand to be processed are increased, the number of partial products increases accordingly. In this case, the number of stages of the full adding circuits increases, and little regularity is observed in interconnection when the output of the full adder is connected to the full adder of the next stage (in the case of the 3-input 2-output full adder, the direction in which the output extends is not determined). Since an extremely complicated arrangement of interconnection is required, the time for layout designing is increased.
FIG. 39 is a diagram showing partial products generated at the time of multiplication of multiplier Y and multiplicand X and a product result thereof. Correction bits sa0, sb2, sc4, and sd6 are generated for partial product 13 to 16, respectively. By adding partial products 13 to 16 and correction bits 33 to 36, the product Z is generated. In actual addition, correction bits may be inserted at corresponding bit positions of partial products on the more significant bit side for addition, and the correction bit sd6 may be separately added finally. Data (sd6, 0, sc4, 0, sb2, 0, sa0) generated from correction bits 33 to 36 may be added as one partial product.
As shown in FIG. 39, the same bit values are repeatedly disposed on the upper bits of the partial products. In a two's complement notation, when the most significant bit of the partial product is 1, which indicates a negative number, a bit value of bits higher than the most significant bit must be 1. This is because the negative number must take an inverted value of each bit value in a two's complement notation. For example, when the ninth bit value a8 in partial product 13 is 1, the partial product is a negative number. When partial product 13 is represented in 16 bits, the tenth and more significant bits must be all 1.
Therefore, when the partial product is represented in two's complement, it is necessary to add extra bits toward the upper bits. In order to process the added bits, the full adder is needed in the Wallace tree adding circuit, which causes a scale of the device and an area of layout to increase.