1. Technical Field
This invention relates to the field of array multipliers. More particularly, this invention relates to fast array multipliers which utilize optimized regular structure, elements and layout to achieve significant processing speed gains over prior art structures. Still more particularly, this invention relates to fast array multipliers which can be easily computer generated with CAD-type software and easily implemented in VLSI and other high density semiconductor chips.
2. Background Art
Array multipliers are used to multiply two binary number values, each having a known number of bits. FIG. 1 depicts a basic unsigned multiplier array for multiplying two 5 bit numbers, X and Y. FIG. 1 is, in essence, derived from FIG. 6.3 of Kai Hwang's book, Computer Arithmetic: Principles, Architecture and Design, John Wiley & Sons, 1979, p. 164. The least significant input bits (LSBs), i.e., 2.sup.0, are denoted X.sub.0 and Y.sub.0. The most significant input bits (MSBs) are denoted X.sub.4, Y.sub.4. Three types of devices are used within this array multiplier to accomplish the multiplication function. These are AND gates (denoted by a 2-input, 1-output device marked with a "G") which are used to form partial products and half-adders (denoted by a 2-input, 2-output device marked with an "H") and 3/2 full-adders or simply "full-adders" (denoted by a 3-input, 2-output device marked with an "F") which are used to combine the partial products into a product or result. A half-adder may be formed from a full-adder with one of its inputs tied to zero, or as a specialized device.
The multiplication of an N-bit binary number by an N-bit binary number yields a 2N-bit binary product. Likewise, the multiplication of an N-bit number by an M-bit number yields an N+M-bit binary product. In the array multiplier of FIG. 1, the product's 5 LSBs are obtained directly from outputs Z.sub.0 -Z.sub.4 at the "bottom" of the array (Note that the terms "right", "left", "top" and "bottom" and similar positional notations are used herein with reference to idealized diagrams such as FIGS. 1 and 2 and are not meant to be limiting--actual circuit implementations in VLSI hardware will, in many cases, bear no physical relationship to these positional notations due to the use of CAD routing techniques). The product's MSBs are derived from applying the four pairs of carry and sum outputs along the "right" side of the array to an adder, typically a Carry Look Ahead (CLA) Adder, to yield outputs Z.sub.5 -Z.sub.9, Z.sub.9 being the final carry of this addition step. The ability to obtain Z.sub.0 -Z.sub.4 directly without a large adder structure for processing all of the outputs of the array is a very advantageous feature of this type of multiplier from both a speed and an areal perspective. It is noted that it is virtually always the case that customers desire faster and smaller (or higher density) electronic components.
As discussed above, in the FIG. 1 array multiplier, there are three elements used. These elements are denoted "G0", "H0" and "F0". These are an AND gate and the most basic forms of half-adder and full-adder, respectively. The truth table for G0 (diagrammed in FIG. 3A) is simply that of a simple AND gate, as given in Table 1:
TABLE 1 ______________________________________ A B S ______________________________________ 0 0 0 0 1 0 1 0 0 1 1 1 ______________________________________
The truth table for H0 (diagrammed in FIG. 4A) is given in Table 2:
TABLE 2 ______________________________________ A B Y S ______________________________________ 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 ______________________________________
The truth for F0 (diagrammed in FIG. 5A) is given in Table 3:
TABLE 3 ______________________________________ A B C Y S ______________________________________ 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 ______________________________________
An example of a binary multiplication of two five-bit binary numbers as it would be out with X=10101 and Y=11111 is set forth below:
______________________________________ 1 0 1 0 1 X .times. 1 1 1 1 1 Y P1 1 0 1 0 1 P2 1 0 1 0 1 P3 1 0 1 0 1 P4 1 0 1 0 1 P5 1 0 1 0 1 1 0 1 0 0 0 1 0 1 1 ______________________________________
The array of data P1-P5 represents the "partial products" of the multiplication of Y.times.X and corresponds to the outputs of the various AND gates G0 in the FIG. 1 array. The row P1 is simply the outputs of the AND gates denoted G0 in the column under the Y0 input of FIG. 1. Row P2 corresponds to the outputs of the AND gates G0 under input Y1, etc.
The binary weight of the inputs and outputs is important. The binary weight refers to the relative binary magnitude of the input or output (or the partial product or other intermediate signal within the array multiplier). In the full-adder described by the truth table in Table 3, if each of the inputs have a binary weight of 1 (e.g., 2.sup.0), then Y has a binary weight of 2 (e.g., 2.sup.1) since it represents an accumulation of two of the binary inputs of weight 1. In adders, the binary weight of inputs and outputs may also be signed, i.e., it may be positive or negative and may thus represent: +1, -1, +2, -2, +4, -4, +8, -8, +16, -16, etc. depending upon the construction of the adder. It should also be noted that in full-adders, the outputs must be constructed to account for the full range of the possible inputs. Thus a simple unsigned input/output full-adder for adding three one-bit numbers of the same binary weight must provide for a sum output of the same binary weight as the inputs and a carry output of the next higher binary weight. For example, if the inputs each represent 2.sup.0 (or 1 decimal), then their sum is 3 decimal or "11" binary which resolves to 1.times.2.sup.0 +1.times.2.sup.1. This constraint applies to all adders, whether signed or not, and whether its inputs deal with one binary weight or a multitude of binary weights.
The term "bitslice" describes the diagonal part of the multiplier circuit located between the X.sub.N and X.sub.N+1 bit input lines. Thus, bitslice B0 is the diagonal portion of array 10 of FIG. 1 between the X.sub.0 and X.sub.1 bit input lines as shown by dashed lines. This portion of the array has a binary magnitude or weight of 0 since its output, if "1" has a value of 2.sup.0. Bitslice B1, adjacent to B0 as shown, has a weight of 1, etc., up to bitslice B8, the diagonal at the upper right hand side of the FIG. 1 array.
While the array multiplier of FIG. 1 is a very straightforward way to produce an array multiplier, it does not represent the fastest approach to obtaining the product number. Furthermore, it lacks a mechanism for dealing with negative (e.g., two's complement) number multiplication.
The basic two's complement array multiplier of FIG. 2, derived from FIG. 6.10 of Computer Arithmetic: Principles, Architecture and Design, supra, p. 177, addresses the problem of signed numbers by using an array of AND gates and different types of half-adders and full-adders as shown. The MSBs of the product are obtained from a carry look ahead subtractor coupled to the carry and sum outputs along the right hand side of the array and to carry forward signal Z.sub.4 as shown.
The essential difference between the "unsigned" multiplier array of FIG. 1 and the "two's complement" or "signed" multiplier array of FIG. 2 is that in FIG. 1 the bits denoted X.sub.4, Y.sub.4, i.e., the most significant bits in the X and Y inputs, have a value or magnitude of 2.sup.4 or 16. In FIG. 2, the same bits have a value of -2.sup.4 or -16--the same binary magnitude, but a different sign. In this way, the MSB in FIG. 2 determines the sign of the number. Processing this negatively signed value requires that negatively signed values be accounted for in the array of FIG. 2. Accordingly, in the array of FIG. 2, bitslices B4-B8 provide for the possibility that X.sub.4 and/or Y.sub.4 may be asserted, denoting that X and/or Y is a negatively valued number. This is accomplished by providing versions of half-adders and full-adders which are adapted to receive negatively weighted inputs and provide negatively weighted outputs, where required.
In the FIG. 2 array multiplier, devices denoted "G1", "G2", "G3", "H2", "F1", "F2" and "F3" are also used to obtain the desired output. These devices are diagrammed, respectively, at FIGS. 3B, 3C, 3D, 4B, 5B, 5C and 5D. The black dots at the inputs and outputs of the devices in these figures (and the bubbled dots in, for example, FIG. 2) indicate negatively weighted input/output signals (i.e., those signals where a "1" represents a negatively weighted binary value). The G0, G1, G2 and G3 devices all have the same truth table and all represent the same circuit--the dot or bubble notation is merely used to indicate the existence of a negatively signed binary signal on a respective input or output which is a significant aid in comprehending the figures that follow. Thus, for example, each type of full-adder (FIGS. 5A, 5B, 5C and 5D) is named for the number of its negatively weighted inputs, thus F0 has no negatively weighted inputs, F2 has two negative inputs and one positive input, etc.
Noting that the F0 and F3 full adders and the F1 and F2 full adders share the same truth tables with different column headings, TABLE 4 shows the truth table for these various adders:
TABLE 4 ______________________________________ FULL ADDER WEIGHTED INPUTS WEIGHTED OUTPUTS ______________________________________ F0 A B C Y S F3 -A -B -C -Y -S 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 F1 A B -C Y -S F2 -A -B C -Y S 0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 1 1 ______________________________________
Signed array multipliers constructed according to FIG. 2 and similar models have regular structure, which lends itself to datapath implementation on VLSI chips, particularly deep sub-micron chips. Most interconnects are very short in these implementations which means that these interconnects will not have significant delays. Nonetheless, these multipliers are relatively slow. The speed of the circuit is limited by the propagation of data through the half-adders and full-adders from "left" to "right", with the result that it takes approximately one full-adder delay ("FAdelay"--the amount of time needed to propagate a signal from the inputs to the outputs of a conventional 3/2 full-adder--also referred to as two "gate-delays") for the signals to pass each "column" in the array (A "column" simply being a vertical array of full-adders one full-adder wide). For an N.times.N multiplier, the propagation delay through the array is approximately (N-1)* FAdelay. "Column 2" of FIG. 2 is the circuitry shown in the box labelled "Col. 2" of FIG. 2. Signals propagate from Column 0 at the "left" of the array through the array toward Column 4 at the right of the array and from there into the CLA subtractor for final processing of the MSBs of the product.
Three basic methods are known for increasing the speed of an array multiplier. First, recoding techniques such as Booth's algorithm (A. D. Booth, "A Signed Binary Multiplication Technique", Quart. J. Mich., Appl. Math, Vol. 4, Part 2, 1951) provides for reducing the number of partial products, and hence the number of full-adder columns within the array. Implementations of Booth-type multipliers are illustrated for example in U.S. Pat. Nos. 4,168,530 to Gajski et al. and 4,575,812 to Kloker et al. A major drawback to the use of Booth-type recoded multiplier structures in a VLSI chip is the fact that they tend to be large and require pre-processor blocks which consume additional area and add some delay.
Second, variations in gate level circuit design can be used to speed portions of the circuit. Unfortunately, because the elements of the multiplier, e.g., AND gates, half-adders and full-adders, tend to be highly specialized and predetermined, optimization within these elements tends to work against the desire for easy replication and scaling.
Finally, the interconnection scheme of the elements of the multiplier can be varied to affect performance. For example, a Wallace Tree or Binary Tree adder arrangement can be used to obtain a significant speed performance enhancement over the simple multipliers of FIGS. 1 and 2. A basic Wallace Tree is described by C. S. Wallace in "A Suggestion for a Fast Multiplier", I.E.E.E. Transactions on Electronic Computers, Vol. EC-13, Febuary 1964, pp. 14-17. Wallace Trees multipliers utilize a tree-like arrangement of carry-save adders (CSAs). While fast, unfortunately, these arrangements lead to complex interconnection arrangements, often with one or more signal leads being much longer than the others, resulting in signal propagation delay problems. They also do not lend themselves to easy replication and scaling.
U.S. Pat. No. 4,752,905 to Nakagawa et al. teaches the use of an array of positive input/positive output 3/2 full-adders (full-adders with three positively weighted binary inputs, A, B and C, and two positively weighted binary outputs, Y (carry) and S (sum)) designed so that the "C" input (one of the three inputs) can arrive later than the "A" and "B" inputs and still yield the desired Y and C outputs at the appointed time. These full-adders are part of the class of full-adders referred to herein as "Fc" adders (see, e.g., FIG. 9). In Nakagawa's embodiment, these Fc adders are used in an array structure to reduce inter-adder gate propagation delays approximately 50% over the basic designs shown in FIGS. 1 and 2. This is achieved, in effect, by interleaving the Fc adders to effectively compress the column widths of the adder arrays by directing "slow" output signals to skip over certain columns and using "fast" inputs to combine with skipped slow signals thus achieving a net speed gain in the processing of signals within the array itself. This approach, unfortunately, results in all of Nakagawa et al.'s output signals from the array being in the carry/sum format which requires that they be applied to an adder structure of some type in order to form the Z.sub.0 . . . Z.sub.N bit outputs which are required of a multiplier array. While Nakagawa et al.'s multiplier is fit for its intended purpose, additional speed performance gains are desired in array multipliers.
What is needed is an array-type multiplier circuit, suitable for easy replication and scaling within computer generated VLSI chip designs which provides the regularity of circuit layout desirable in such computer generated designs with the speed advantages of Wallace tree multipliers.