The simplest form of multiplier is the "AND" gate. The output of such a gate may be viewed as the one-bit product of two one-bit inputs. However, the utility of using a single AND gate for multiplying is limited to multiplicands of only one binary bit, producing a one bit product. Binary numbers (multiplicands) of any bit length may be multiplied by a one-bit binary number (multiplier) by using a number of "AND" gates equal to the number of bits in the multiplicand. Assuming a one-bit multiplier and an `n`-bit multiplicand, one input of each of the `n` gates is assigned to one bit of the multiplicand while the other input of all of the gates is connected to the multiplier. If the outputs of this array of "AND" gates is taken as an `n`-bit binary number, it represents the product of the two numbers. A simple 1-bit by 8 -bit instance of this type of multiplier is shown in FIG. 1.
If two binary numbers, each having a length of `n` bits, are to be multiplied together, the aforementioned scheme may be replicated for each bit of the multiplier. Each of the resulting bits represents a partial product. These partial products must be summed to arrive at the resulting fully resolved product of the multiplier and multiplicand. In the process of summings, it must be recognized that each one-bit partial product has a binary weight (i.e., significance) associated with it, i.e., the bit value will be one or zero, but it represents a binary value which is determined by the product of the binary weights of the two bits whose product it represents.
The complexities encountered in the design of parallel adders suitable for the sum-of-products application in parallel multipliers led to the development of the "Wallace Tree". Simply stated, the Wallace Tree is a logical connection of full adders which can simply be replicated (in an array) to perform the sum of products calculation. An example of a Wallace Tree is the Texas Instrument type SN54LS275 4-bit-by-4-bit binary multiplier with 3-state outputs, 7-bit slice Wallace Tree with 3-state outputs.
As is well known, the full adder is a logical device that has three one-bit inputs, a one-bit sum output and a one-bit carry out. It is also well established that the full adder interposes a delay, hereinafter referred to as a one "unit" (of time) delay, between its inputs and outputs. Hence, in any interconnection of full adders, such as in a typical Wallace Tree, the various columns being added experience different delays due to their different logic path lengths. For instance, in a typical Wallace Tree, such as the aformentioned SN54LS275, the least significant digit (column) passes straight through, experiencing a delay of only one unit, while intermediate significant digits experience a greater delay of three units. Evidently, as the Wallace Tree is replicated, forming an array to handle larger and larger multiplicands and multipliers, greater and greater delays are interposed, resulting in slower response times for larger arrays. The effective system delay for any such adder, regardless of configuration, is the delay incurred in the longest path from input to output, since the result may not be considered valid until its last component bit has settled to a stable state.
A significant contributor to the delay in current parallel adder configurations is the "carry" logic. Typically, all of the bits of a given weight are summed to arrive at intermediate multi-bit sums. For example, if seven input bits are to be summed, an array of adders is typically configured which will produce a 3 bit binary result whose value may range between 0 and 7 (000 and 111 binary). The least significant bit of this sum has the same binary weight as the component (input) bits, but the other output bits are carried over to neighboring (more significant) columns according to their binary weight, where they must be summed with other component bits having the same weight. The bit having the same weight as its component bits may yet have to be added to other component bits at the same weight or combined with other partial multi-bit sums. This may, in turn, produce further multi-bit sums which must further be added to other component bits. As a result of this process of adding intermediate sums together to arrive at other intermediate sums, the longest path in the circuit can become quite long. Further, the carry bits between columns are typically inserted in the adder chain at a point which is logically convenient, but which may not necessarily be the most efficient in terms of delay.
While the Wallace Tree offers a straightforward, systematic, standardized approach to multiplication, and multiplier design, it is very gate intensive and it is usually not optimized for performance.
Non-standardized (hand) design of multipliers, adders and multiplier-adders is very time consuming and error prone. Various compilers which have been employed for this task are limited in their flexibility and optimization approaches.