1. Field of the Invention
The present invention relates to a semiconductor device, and more particularly, to a semiconductor device with a parallel multiplier using at least three wiring layers.
2. Description of the Related Art
High-performance and high-speed data processing devices are in increasing demand. To realize high-speed multiplication, LSIs adapted for parallel-multiplication or LSIs which incorporate parallel-multiplication macrocells (functional blocks subjected to design verification test) have been in wide use recently. By significant advances in hardware technology, an LSI may incorporate a large-scale parallel multiplier. For instance, a parallel multiplier for two 32-bit numbers has already been implemented.
Such a parallel multiplier forms the product of a multiplicand and a multiplier in accordance with the following process: (1) Partial products are generated, and (2) The resulting partial products are summed to provide the product. Such an arithmetic operation is described in an article entitled "Outlook on circuit systems of parallel multipliers which are now in progress of LSI version", NIKKEI ELECTRONICS in Japanese, May 29, 1978.
For instance, let us consider parallel multiplication of two 8-bit numbers as shown in FIG. 1. In this Figure, A.sub.1 to A.sub.8 represent the bits of a multiplicand, while B.sub.1 to B.sub.8 the represent multiplier bits. a.sub.1 to a.sub.8, b.sub.1 to b.sub.8, c.sub.1 to c.sub.8, d.sub.1 to d.sub.8, e.sub.1 to e.sub.8, f.sub.1 to f.sub.8, g.sub.1 to g.sub.8 and h.sub.1 to h.sub.8 represent sets 10.sub.1, 10.sub.2, 10.sub.3, 10.sub.4, 10.sub.5, 10.sub.6, 10.sub.7 and 10.sub.8 of partial products and p.sub.1 to p.sub.15 represent the product bits. The sets of partial products result from multiplying the multiplicand by a specific bit of the multiplier.
The addition of the partial products described above is generally carried out in the two following ways.
(1) Carry save method: FIG. 2 illustrates that portion of an array circuit for performing parallel multiplication of two 4-bit numbers which corresponds to the partial products 10.sub.1 -10.sub.8 of FIG. 1. That is, partial products 12.sub.1, resulting from multiplying a multiplicand by respective bit of a multiplier, are applied to full adders 14.sub.1 to 14.sub.3 and a q.sub.8 (product bit) output. The most significant bit of next partial products 12.sub.2 is applied to a next-stage full adder 14.sub.4, while the remaining bits thereof are applied to full adders 14.sub.1, 14.sub.2, 14.sub.3 together with the three high-order bits of the preceding partial products 12.sub.1. As a result, the carries (C) of full adders 14.sub.1, 14.sub.2 and 14.sub.3 are supplied to next-stage full adders 14.sub.4, 14.sub.5 and 14.sub.6, respectively, and the sums (S) thereof are applied to full adders 14.sub.5 and 14.sub.6 and q.sub.7 (product bit) output. Similarly, in full adders 14.sub.4, 14.sub.5 and 14.sub.6, the carries (C) from full adders 14.sub.1, 14.sub.2 and 14.sub.3 are added to corresponding bits of partial products 12.sub.3 and the results are applied to full adders 14.sub.7, 14.sub.8 and 14.sub.9 and q.sub.6 output. Furthermore, full adders 14.sub.7, 14.sub.8 and 14.sub.9 add the carries (C) from full adders 14.sub.4, 14.sub.5 and 14.sub.6 to corresponding bits of partial products 12.sub.4 and then provide the resulting bits to high-speed adder 16 and q.sub.5 output. High-speed adder 16 provides high-order bits q.sub.1 -q.sub.4 of the product.
(2) Tree method: Unit blocks for performing an addition of a predetermined number of inputs are properly combined to perform addition for each digit of a product. FIG. 3 shows an 8-input adder for a set 20 of specific bits, for example, a.sub.1, b.sub.2, c.sub.3, d.sub.4, e.sub.5, f.sub.6, g.sub.7 and h.sub.8 included in partial products 10.sub.1 to 10.sub.8 shown in FIG. 1. In this example, a 3-input, 2-output full adder is used as the unit block. In FIG. 3, among the specific bits of the partial products 10.sub.1 -10.sub.8 of FIG. 1 a.sub.1, b.sub.2 and c.sub.3 are added together in full adder 22.sub.1, d.sub.4, e.sub.5 and f.sub.6 in full adder 22.sub.2, and g.sub.7 and h.sub.8 in full adder 22.sub.3. The carry outputs of full adders 22.sub.1 -22.sub.3 are applied to the higher order bits not shown, while the partial sums 24.sub. 1, 24.sub.2 and 24.sub.3 are directed to full adder 22.sub.4. The partial sum 24.sub.4 from full adder 22.sub.4 is applied to full adder 22.sub.6 together with the partial sum 24.sub.5 from full adder 22.sub.5 receiving carries from the lower order bits not shown. On the other hand, the carry from full adder 22.sub.4 is output to a higher order bit together with the carry from full adder 22.sub.5. Full adder 22.sub.6 adds together the partial sums 24.sub.4 and 24.sub.5 from full adders 22.sub.4, 22.sub.5 and a carry from a lower bit to provide the partial sum 24.sub.6 to full adder 22.sub.7. Full adder 22.sub.7 adds together the partial sum 24.sub.6 from full adder 22.sub.6 and a carry from the lower order bit to provide the carry and sum to a high-speed adder not shown. Such an adder circuit as described above is provided for each digit of the partial product sets 10.sub.1 -10.sub.8 to obtain the product (answer).
FIG. 4 shows the basic circuit arrangement of the full adder of 3-input, 2-output AND-OR form in the prior art described above.
In this Figure, an A input is applied to AND circuits 26.sub.1, 26.sub.4, 26.sub.5 and 26.sub.6, a B input to AND circuits 26.sub.2, 26.sub.4, 26.sub.5 and 26.sub.7, and a CO input to AND circuits 26.sub.3, 26.sub.4, 26.sub.6 and 26.sub.7. The A input is further applied to AND circuits 26.sub.2 and 26.sub.3 via an inverter 28.sub.1. Likewise the B input is applied to AND circuits 26.sub.1 and 26.sub.3 via an inverter 28.sub.2, and the C input to AND circuits 26.sub.1 and 26.sub.2 via an inverter 28.sub.3. AND circuits 26.sub.1 to 26.sub.4 feed their computational results to an OR circuit 30.sub.1, while AND circuits 26.sub.5 - 26.sub.7 provide their operational results to an OR circuit 30.sub.2. To the next stage (higher order bit), the output of OR circuit 30.sub.1 as the sum (S) and the output of OR circuit 30.sub.2 as the carry (C) are applied.
According to the carry save method described above, LSIs can implement the parallel multiplication by the use of a highly regular layout of circuit components. However, an array of (n-1) full adders is needed to add together n partial products. Hence problems associated with the above approach lie in the need of long signal transmission paths. The long signal transmission paths will slow the computing speed.
According to the tree method, on the other hand, the number of full adders to path through may be small. In this respect the computation can be speeded up. However, since the blocks in the prior art are wired by the use of first and second metal wiring layers formed on a semiconductor substrate, an increase in wiring in number between the blocks will result in the complication of wiring of blocks and the need of a large number of steps in pattern layout. In addition, the increase of wiring in number will increase the area of an LSI chip as well, resulting in a problem that high-speed processing is impeded.