The present invention relates to high speed parallel multiplier circuits.
In binary multiplication, a N+M bit product [P=(p.sub.m+n, p.sub.m+n-1, . . . p.sub.1)] is formed by multiplying an N-bit multiplicand [A=(a.sub.n, a.sub.n-1, . . . a.sub.1)] by an M-bit multiplier [B=(b.sub.m, b.sub.m-1, . . . b.sub.1)]. The multiplication is illustrated in FIG. 1 which shows the product P as the sum of corresponding elements of a summand matrix 1. The summand matrix 1 has M.times.N original entries of partial product or summand matrix bits which are each the logical AND of a different pair of multiplier and multiplicand bits.
If the addition of the summand matrix bits was to be performed in a single logical level in order to obtain the product P, such a sum could be obtained by using a parallel adder circuit for each column. The inputs to the ith adder (for the ith column) would include the original summand matrix bits in the ith column and also include the carry outputs from the lower order adders. This method for addition of the summand matrix has serious drawbacks. First, parallel adders for a large number of inputs are difficult to implement. Second, they have a large amount of delay associated with them due to the propagation of carries along the chain of parallel adders from the least significant bit position to the most significant bit position and the addition of each bit has to occur in sequence from the least significant bit position to the most significant bit position. Therefore, the total time required to perform binary multiplication using this method for addition of the summand matrix becomes prohibitive.
There have been a number of attempts to increase the speed at which a digital computer can perform binary multiplication. These attempts involve accelerating the addition of the summand matrix bits. In general, such attempts focus on repetitive operations, called reductions, which reduce the number of summand matrix bits until there are two rows of bits (i.e., addends) whose sum equals the product. The reductions generally utilize several logical "levels" of adder circuits each corresponding to a different column of the summand matrix. Such adder circuits, for example, include full adders which produce a sum and carry bit from three inputs and half adders which produce a sum and carry bit from two inputs. Within each logical level of reduction no carry propagation is allowed, thus enabling many additions to occur simultaneously, instead of successively. When the summand matrix is reduced to two rows of bits, these two rows can be input into a full carry-propagating adder to obtain the product. Therefore, carry propagation is confined to the last step where it can be accomplished by high speed circuits.
The use of full and half adders as opposed to multiple input parallel adders significantly decreases the time required to perform binary multiplication. A full adder 5 is shown in FIG. 2A for operands A, B, and C and can be defined by the following two equations: EQU SUM=A XOR B XOR C EQU CARRY=(A AND B) OR (B AND C) OR (A AND C).
A half adder 12 is shown in FIG. 2B for operands A and B and can be defined by the following two equations: EQU SUM=A XOR B EQU CARRY=A AND B.
As shown in FIG. 2A, full adder circuit 5 is equivalent to a 3-input exclusive OR gate 6 connected to receive bits A, B, and C and outputting a SUM bit, three 2-input AND gates 7 to 9 and a 3-input OR gate 10 10 which outputs a CARRY BIT. AND gate 7 is connected to receive bits B and C, AND gate 8 is connected to receive bits A and B, and AND gate 9 is connected to receive bits A and C. The outputs from AND gates 7, 8, and 9 become inputs for OR gate 10.
As illustrated in FIG. 2B, half adder circuit 12 is equivalent to a 2-input exclusive OR gate 14 connected to receive bits A and B and outputting a SUM bit and a 2-input AND gate connected to receive bits A and B and outputting a CARRY bit. The logical definition of parallel adders having more than three inputs is quite complex. Parallel adders are, therefore, generally difficult to fabricate.
One prior summand matrix reduction scheme utilizing the above principles, described in "A Suggestion for a Fast Multiplier," C. S. Wallace, Vol. 13, No. 14, IEEE Transactions on Electronic Computers (Feb. 1964), proposes grouping the summand matrix bits in each column of the matrix into groups of three bits and using full adders to add the groups of three bits or half adders if only two bits for a given column remained. The adders produced a sum bit for the same column and a carry bit for the column with the next most significant bits. The groups of bits are added in each column (including in later groupings the sum and carry bits from previous groupings) until the summand matrix is reduced to two rows of bits, one row representing a row of sum bits and one row representing a row of carry bits. The two rows are then input into a traditional carry-propagating adder which can perform a fast addition operation based on a carry-lookahead design.
The rule for the Wallace method of reduction is to reduce the columns as much as possible as soon as possible, and to utilize as many full adders as possible. The Wallace method of summand reduction is illustrated in FIG. 3 for a 5.times.5 bit multiplier circuit. The original entries in the summand matrix are represented as a.sub.1 b.sub.1, a.sub.2 b.sub.1, etc., as shown in FIG. 1. The reduction of the 5.times.5 matrix to two rows of bits requires three logical levels of reduction.
In the first logical level of reduction, level I, each column of the summand matrix with at least three bits is divided into groups of three bits and each group of three bits is then input into a full adder. For example, a.sub.1 b.sub.2, a.sub.2 b.sub.2, and a.sub.3 b.sub.1 are input to full adder 15 which produces a sum bit and a carry bit that are inputs to gates at the second logical level of reduction, level II. In this manner, three input bits are reduced to two output bits at level I. Similarly, a.sub.2 b.sub.3, a.sub.3 b.sub.2, and a.sub.4 b.sub.1 are input to full adder 17 which outputs a sum bit and a carry bit; a.sub.3 b.sub.3, a.sub.4 b.sub.2, and a.sub.5 b.sub.1 are input to full adder 19 which outputs a sum bit and a carry bit; a.sub.1 b.sub.5 and a.sub.2 b.sub.4 remain in column 5 and are input to half adder 25 which outputs a sum bit and a carry bit; a.sub.3 b.sub.4, a.sub.4 b.sub.3, and a.sub.5 b.sub.2 are input to full adder 21 which outputs a sum bit and a carry bit; and a.sub.3 b.sub.5, a.sub.4 b.sub.4, and a.sub.5 b.sub.3 are input to full adder 23 which outputs a sum bit and a carry bit. All sum bits and carry bits output from the level I gates are inputs to gates in level II.
In the level II reduction, full adder 27 receives as inputs an original entry in the summand matrix, a.sub.1 b.sub.4, the sum bit from full adder 17 and the carry bit from full adder 15 as inputs. From these inputs full adder 27 produces a sum bit and a carry bit which are routed to gates at the third logical level of reduction, level III. Similarly, full adder 29 receives as inputs the carry bit from full adder 17, the sum bit from full adder 19, and the sum bit from half adder 25 and produces a sum bit and a carry bit which are routed to gates at level 111. Full adder 31 receives as inputs the carry bit from half adder 25, the carry bit from full adder 19, and the sum bit from full adder 21 and produces as outputs a sum bit and a carry bit which are routed to gates at level III. Finally in level II, full adder 33 receives as inputs original summand entries a.sub.4 b.sub.5, and a.sub.5 b.sub.4 (they were part of a column with only two entries), and the carry bit from full adder 23 and produces as outputs a sum bit and a carry bit which are routed to gates at level III. After the reduction in level II, most of the columns have only two bits remaining. For the ones that do not, an additional level of reduction is required using full adders or half adders depending upon the number of remaining bits.
In the level III reduction, half adder 35 receives as inputs an original entry in the summand matrix, a.sub.2 b.sub.5, and the sum bit from full adder 31 and produces as outputs a sum bit and a carry bit. Full adder 37 receives as inputs the sum bit from full adder 23, the carry bit from full adder 21 and the carry bit from full adder 31 and produces as outputs a sum bit and a carry bit. After level III, the original summand matrix is reduced to a set of two rows of bits. These remaining bits are then input into a carry-lookahead adder 40 to produce the product.
As illustrated in FIG. 3, ten full adders and two half adders are used to reduce the original summand matrix to two rows of bits. As the number of bits of the multiplicand and multiplier increases, the number of adders required to reduce the summand matrix also increases. Also, the increased hardware and wires required to implement this reduction scheme creates problems due to the difficulty of routing inputs to the adders at the various logical levels of reduction. As a result of the hardware and wire increases, the speed of performance decreases due to the increased delays in the hardware and wires, the uneven distribution in the density of wires, and the complex routing scheme.
Another prior summand matrix reduction scheme, proposed in "Some Schemes For Parallel Multipliers," L. Dadda, Vol. 34, Alta Frecuenza (March 1965), postulates that only the minimum number of inputs from the summand matrix should be reduced at each logical level of reduction. The Dadda scheme begins with the goal of reducing a summand matrix to two rows of bits. The Dadda scheme then works backward and calculates that two rows result from the reduction of three rows; three rows are transformed from six rows; six rows can be reduced from nine rows; nine rows are transformed from thirteen rows; thirteen rows are transformed from nineteen rows; etc. Thus, this scheme results in the following series: EQU 2; 3; 6; 9; 13; 19; 28; 42; 63 . . .
This scheme postulates that with each logical level of reduction, the columns should only be reduced to the point of the next lower number in the series by using full adders and half adders. This scheme uses fewer gates on the first logical level of reduction. The goal of the scheme is fewer total gates than the first reduction scheme mentioned above with reference to FIG. 3 as a way to increase speed.
The second reduction scheme is illustrated in FIG. 4. At the level I reduction, half adder 50 receives as inputs a.sub.5 b.sub.1 and a.sub.4 b.sub.2 and outputs a sum bit and a carry bit which are routed to gates at level II. Half adder 52 receives as inputs a.sub.4 b.sub.3 and a.sub.5 b.sub.2 and outputs a sum bit and a carry bit which are routed to gates at level II.
At the level II reduction, half adder 54 receives as inputs a.sub.3 b.sub.2 and a.sub.4 b.sub.1 and outputs a sum bit and a carry bit which are routed to gates at level III. Full adder 56 receives as inputs a.sub.2 b.sub.4, a.sub.3 b.sub.3, and the sum bit from half adder 50 and outputs a sum bit and a carry bit. Full adder 58 receives as inputs a.sub.3 b.sub.4, the sum bit from half adder 52, and the carry bit from half adder 50 and outputs a sum bit and a carry bit. Full adder 60 receives as inputs a.sub.4 b.sub.4, a.sub.5 b.sub.3, and the carry bit from half adder 52 and outputs a sum bit and a carry bit which are routed to gates at level III.
At the level III reduction, half adder 62 receives as inputs a.sub.2 b.sub.2, a.sub.3 b.sub.1 and outputs a sum bit and a carry bit. Full adder 64 receives as inputs a.sub.1 b.sub.4, a.sub.2 b.sub.3, and the sum bit from half adder 54 and outputs a sum bit and a carry bit. Full adder 66 receives as inputs a.sub.1 b.sub.5, the sum bit from full adder 56, and the carry bit from half adder 54 and outputs a sum bit and a carry bit. Full adder 68 receives as inputs a.sub.2 b.sub.5, the sum bit from full adder 58, and the carry bit from full adder 56 and outputs a sum bit and a carry bit. Full adder 70 receives as inputs a.sub.3 b.sub.5, the sum bit from full adder 60, and the carry bit from full adder 58 and outputs a sum bit and carry bit. Full adder 72 receives as inputs a.sub.4 b.sub.5, a.sub.5 b.sub.4 and the carry bit from full adder 60 and outputs a sum bit and a carry bit. The summand matrix is then reduced to a set of two rows of bits which can be input into carry-lookahead adder 74 to produce the product of the original multiplicand and multiplier.
In this scheme, as illustrated in FIG. 4, a large number of original summand matrix bits are reserved to be input directly into carry-lookahead adder 74. A smaller number of original summand matrix bits is input into the adders at level I than in the reduction scheme mentioned above with reference to FIG. 3. This scheme employs eight full adders and four half adders to reduce a 5.times.5 summand matrix to two rows of numbers. Although the amount of hardware required to implement the design is less than used in the scheme illustrated in FIG. 3, the routing of inputs to the adders and the wiring of the circuit is still quite complex and results in performance delay.
Both the Wallace and Dadda schemes recognize the need for a fast binary multiplier circuit. However, neither scheme recognizes the advantages of the use of an increased number of half adders as opposed to full adders for an increase in speed and a decrease in space required. Half adders have several advantages over full adders. For example, the use of an increased number of half adder circuits reduces the propagation delay associated with the overall summand reduction because full adders are a more complex and slower device. Furthermore, use of half adder circuits, as opposed to full adder circuits, reduces the area required on a circuit board for the gate layout since a half adder circuit only requires approximately one-quarter of the area required for a full adder circuit. Also, a half adder circuit requires less power than a full adder circuit. Furthermore, neither scheme recognizes the problems associated with the crossing of wires and the routing of inputs to the gates for summand matrix reduction.