The present invention relates to a high-speed multiplier which can be easily constituted in the form of an integrated circuit.
A single-chip digital signal processor has a high-speed parallel multiplier, and performs multiply and accumulate operations at high speeds, and further performs real-time processing in the field of speech signal processing.
In the field of image processing where the amount of data is considerably greater than that in the field of speech signal processing, it is necessary to perform the multiply and accumulate operations at higher speeds so that the processing is carried out in real time. Therefore, it has been desired to develop a high-speed multiplier which can be used for such applications.
In the parallel multiplier which can perform arithmetic operation at high speeds, the full adders are arranged in an array structure to add the partial products in parallel.
The algorithm of parallel multiplication can be divided into the following two steps:
(1) The partial products are formed simultaneously by ANDing a multiplicand bit with a muliplier bit.
(2) The partial products are added concurrently to find a product.
There has been known a modified Booth algorithm to carry out the above method (1) at high speeds.
This method makes it possible to halve the number of partial products within a short period of time.
Therefore, the operation speed can be doubled in the above method (2).
In order to carry out the above method (2) at high speeds, we have hitherto frequently employed carry save addition.
FIG. 1 shows a system for carry save addition (S. Waser, "High Speed Monolithic Multipliers for Real-Time Digital Signal Processing", Computer, pp. 19-28, Oct., 1978). This system is based on a principle that when the sum of three or more partial products (Q1, Q4, Q7), (Q2, Q5, Q8), (Q3, Q6, Q9), (Q10, Q11, Q12), ------, each consisting of three bits, is to be found, the carry propagation is postponed until the last time.
That is, in FIG. 1, a carry Co obtained by adding three bits Q7, Q8, Q9 through a full adder 102 is not input to a full adder 101 of upper bit 1 on the same stage, but is added to a fourth bit Q11 of a full adder 104 of one stage under together with a sum So of the full adder 101. In FIG. 1, black circles represent bits of partial products. The above-mentioned procedure is repeated until there is no bit to add. There will finally be obtained two bits.
One of these bits consists of a sum of the last stage, and another bit consists of a carry of the last stage. If these two bits are added together without carry propagation by using a carry-look ahead adder, the carry propagation in the horizontal direction can be avoided, and the operation speed can be increased correspondingly.
In regard to the above-mentioned method (2), a Wallace tree has been known to maximize the operation speed by minimizing the number of stages of full adders through which the signals pass.
The principle of this method consists of adding the partial products in parallel. This method will be described below in conjunction with FIG. 2 wherein black circles represent bits of partial products like those of FIG. 1. FIG. 2 shows the case where eight partial products, each consisting of three bits, are added together , i.e., (Q22, Q25, Q28), (Q23, Q26, Q29). (Q24, Q27, Q30), (Q31, Q34, Q37), (Q32, Q35, Q38), (Q33, Q36, Q39), (Q40, Q42, Q44), (Q41, Q43, Q45) are added together. One full adder can add three bits at one time. Therefore, three partial products Q22 to Q30 are added by full adders 200 to 202 of the first stage of FIG. 2, and another three partial products Q31 to Q39 are added by full adders 203 to 205 of the second stage, so that six partial products can be added in parallel.
Then, a bit consisting of the sum S of full adders 200 to 202 of the first stage, and the remaining two partial products, are added through full adders 206 to 208 of the third stage. At the same time, a total of three bits, i.e., a bit consisting of the carry Co of full adders 200 to 202 of the first stage, and two bits consisting of a carry Co and a sum So of full adders 203 to 205 of the second stage, are added through full adders of the fourth stage.
According to the Wallace tree as mentioned above, n partial products are divided into n/3 groups each consisting of three partial products. These groups are then added in parallel thereby to simultaneously obtain a total of 2n/3 bits, i.e., to obtain n/3 bits consisting of carries and n/3 bits consisting of sums. That is, n partial products are reduced to 2n/3 with the delay of full adders of one stage.
The thus obtained 2n/3 bits and the remaining partial products if any are divided again into groups each consisting of three bits, and the above-mentioned operation is repeated to reduce the partial products into 2/3 each with the delay of full adders of one stage.
According to the above-mentioned system, therefore, the number of stages of full adders through which n partial products pass before they are reduced to two, is proportional to log n.
In the carry save addition of FIG. 1, there are n-2 stages of full adders through which n partial products must be passed before they are reduced to two products. It can therefore be recognized that the Wallace tree performs the addition at a very high speed compared with the above method. Namely, at present, the multiplication system which can minimize the number of addition stages is obtained by combining the modified Booth algorithm with the Wallace tree.
When the multiplier is to be constituted on an LSI, however, the Wallace tree results in an increase in the number of interconnections and in the length of interconnections, causing the interconnections to become very complex. Therefore, there develops an additional time delay due to parasitic capacitance of interconnections, making it no more reasonable to evaluate the operation speed relying simply upon the number of addition stages. Because of the above-mentioned reasons, therefore, the circuit area increases and the man-hours also increase for logic design and layout design.
Therefore, attempts have been made to improve the carry save addition as shown in FIG. 3 by connecting full adders in the stages of even numbers separately from full adders in the stages of odd numbers (Digest of Tech. Papers 1984, IEEE ISSCC "A CMOS/SOS Multiplier" pp. 92-93). According to this system, n partial products are divided into two groups of a row of an even number and of a row of an odd number, and are subjected to the carry save addition each in a number of n/2 in parallel, enabling the number of addition stages to be halved compared with the conventional number of addition stages. In FIG. 3, full adders 300, 301, 302, 306, 307, 308, 312, 313, 314 constitute carry save adder circuits of rows of odd numbers, and full adders 303, 304, 305, 309, 310, 311 constitute carry save adder circuits of rows of even numbers.
Black circles represent bits of partial products like in the aforementioned cases.
This system does not impair regularity in the conventional carry save adder circuits, and presents the advantage that the amount of interconnections does not increase or the interconnections do not become complex. With regard to the number of addition stages, however, the number is n/2 with this system in comparison with log n of the Wallace tree. Therefore, this system is slightly inferior to the Wallace tree in regard to the operation speed.