1. Field of the Invention
The present invention relates to multiplier circuits for use in computer systems.
2. Related Art
Modern microprocessors and other integrated circuits typically include dedicated multiplier circuits for multiplying binary numbers. It is desirable that multiplier circuits be easy to design, capable of fitting in a small area, capable of performing multiplication quickly, and inexpensive to manufacture. Circuit designers continue to be challenged by these design goals as the need continues to increase for processors that are ever smaller, faster, and inexpensive. In particular, the movement toward processors with 64-bit architectures poses particular challenges for designers of multiplier circuits, since the rote application of old methodologies to such architectures may produce multiplier circuit designs that are large, complex, costly, slow, and time-consuming to lay out.
Multiplier circuits typically multiply binary numbers using the same general algorithm taught in elementary school. For example, referring to FIG. 1, a diagram is shown illustrating techniques that may be used to multiply a first binary operand 102 (referred to as the “multiplicand”) by a second binary operand 104 (referred to as a “multiplier”) using the “elementary school” algorithm. The multiplier 104 is written directly beneath the multiplicand 102, with the bits of the multiplicand 102 and multiplier 104 aligned in columns 110a–d. Each of the bits in the multiplier 104 is multiplied by all of the bits in the multiplicand 102 to produce partial products 106a–d, respectively. For example, the bit of multiplier 104 in column 110a is multiplied by the multiplicand 102 to produce partial product 106a, the bit of multiplier 104 in column lob is multiplied by the multiplicand 102 to produce partial product 106b, and so on.
Each successive one of the partial products 106a–d is shifted to the left by one column with respect to the previous one of the partial products 106a–d to facilitate the addition of the partial products 106a–d. In this arrangement, the partial products 106a–d span columns 110a–h. Thus arranged, the partial products 106a–d are added, thereby producing final product 108, which is the product of the multiplicand 102 and the multiplier 104.
Referring to FIG. 2, a functional block diagram is shown of a prior art system 200 for using a multiplier circuit 206 to multiply a binary multiplicand 204 by a binary multiplier 202. The term “multiplier circuit” is used herein to refer specifically to circuitry that performs multiplication, and thereby to avoid any ambiguities that may be raised by the use of the term “multiplier” to refer to one of the operands in a multiplication operation.
The multiplier circuit 206 includes a partial product generator 208 that receives the multiplier 202 and multiplicand 204 as inputs, and that produces as output a plurality of partial products 210. Various partial product generators are well-known to those of ordinary skill in the art and generally produce partial products in a manner similar to that described with respect to the “elementary school algorithm” of FIG. 1. The partial products 210 are provided to a partial product adder 212, which adds the partial products 210 to produce a final product 214 that is equal to the product of the multiplier 202 and the multiplicand 204.
In practice, the process of adding the partial products 110 is broken down into multiple stages. As also taught in elementary school, the process of adding two or more numbers may generate digits (or bits in the case of binary numbers) that need to be “carried” from one column to the next. The partial product adder 212 typically is implemented to perform carry operations in a manner that is more computationally efficient than the method typically taught in elementary school.
For example, referring to FIG. 3, a diagram is shown illustrating techniques that may be used by the partial product adder 212 to add the partial products 106a–d. The bits in each of the columns 110a are added to each other, beginning with column 110a and moving to the left. For example, the bits in column 110a are summed, producing a zero sum and a zero carry. This is reflected by the zero (representing the zero sum) in column 110a of a sum word 302 and the zero (representing the zero carry) in column 110b of a carry word 304. Note that the carry word 304 is shifted to the left by one bit position with respect to the sum word 302. The bits in columns 110b and 110c may be added in a similar manner, thereby producing sums without any carry. When the bits in column 110d are added, however, the result is zero with a carry of one. This is reflected by the value of zero in column 110d in the sum word 302 and the value of 1 in column 110e of the carry word 304. Bits in the remaining columns 110e–g are added in the same way to produce the remaining bits in the sum word 302 and carry word 304. Once the sum word 402 and carry word 304 are generated, they may be summed to produce a final product 306.
Referring to FIG. 4, a functional block diagram is shown of the high-level structure that typically is used to implement the partial product adder 212 to perform partial product addition in the manner illustrated in FIG. 3. The partial product adder 212 includes an adder array 402 that receives the partial products 210 and sums them, thereby generating a sum word 406 and a carry word 404. The sum word 406 and carry word 404 are provided to a full adder 408, which sums the sum word 406 and carry word 404, thereby producing the final product 214. One advantage of using the techniques illustrated in FIG. 4 is that the value of each bit in the carry word 404 depends only on the sum of the bits in a single one of the columns 110a–h (FIG. 3), thereby simplifying the implementation of the adder array 402 in comparison to techniques in which carry bits are propagated through the sum as it is generated. In the system illustrated in FIG. 4, only the final full adder 408 need perform an addition that requires carry propagation.
The time required to sum the partial products 210 may be reduced, for example, by: (1) reducing the number of partial products 210; and/or (2) increasing the speed with which the adder array 402 adds the partial products 210. One well-known technique that has been used to reduce the number of partial products 210 is referred to as Booth encoding, according to which the multiplier 202 is encoded in a manner that cuts the number of partial products 210 in half or more. In other words, if n is the number of bits in the multiplier 202, then radix-4 Booth encoding enables the number of partial products 210 to be reduced to n/2 and radix-8 Booth encoding enables the number of partial products 210 to be reduced to n/3. Booth encoding is described in more detail in “A Signed Binary Multiplication Technique,” Andrew D. Booth, Quart. Journal Mech. and Applied Math., Vol. IV, part 2, 1951.
The adder array 402 may be implemented in various ways that are well-known to those of ordinary skill in the art. For example, referring to FIG. 5A, a functional block diagram is shown of a linear adder array 502 that may be used to implement the adder array 402. The linear adder array 502 adds nine partial products 504a–i (which are examples of the partial products 210) to produce the final product 214. The linear adder array 502 includes full adders 506a–g, each of which receives three binary numbers as inputs and produces a corresponding sum and carry word as outputs. In particular, full adder 506a receives partial products 504g–i as inputs, sums the partial products 504g–i, and produces a sum word 512a and a carry word 510a as outputs. Carry word 510a, sum word 512a, and the next partial product 504f are provided as inputs to the next full adder 506b, which produces a corresponding sum word 512b and carry word 510b as outputs.
The remaining partial products 504a–e are added to the running sum in the same manner, namely by providing them as inputs to the full adders 506c–g, which produce corresponding sum words 512c–g and carry words 510c–g. The final sum word 512g and carry word 510g are provided to the final full adder 408, which sums the sum word 512g and the carry word 510g to produce the final product 214. Although the layout and operation of the linear adder array 502 are straightforward, the size and calculation time of the linear adder array 502 increases linearly with the number of partial products, thereby making it poorly-suited to use in conjunction with the large number of partial products typically generated when multiplying large binary numbers.
Referring to FIG. 5B, a functional block diagram is shown of a parallel adder array 520 that may be used to implement the adder array 402. The parallel adder array 520 adds the nine partial products 504a–i to produce the final product 214. The parallel adder array 520 includes full adders 526a–g, each of which receives three binary numbers as inputs and produces a corresponding sum and carry word as outputs. In particular, full adder 506a receives partial products 504g–i as inputs, sums the partial products 504g–i, and produces a sum word 532a and a carry word 530a as outputs. Full adder 526d receives partial products 504a–c as inputs, sums the partial products 504a–c, and produces a sum word 532d and a carry word 530d as outputs. Because neither of the full adders 526a and 526d rely on each other's results for input, the full adders 526a and 526d may operate in parallel with each other.
Referring again to the left-hand side of FIG. 5B, partial product 504f and the outputs of full adder 526a are provided as input to full adder 526b, which produces a sum word 532b and a carry word 530b. Partial product 504e and the outputs of full adder 526b are provided as input to full adder 526c, which produces a sum word 532c and a carry word 530c. Referring back to the right-hand side of FIG. 5B, the partial-product 504d and the outputs of full adder 526d are provided as input to full adder 526e, which produces a sum word 532e and a carry word 530e. 
Carry word 530c, sum word 532c, and carry word 530e are provided as inputs to a full adder 526f, which produces a sum word 532f and a carry word 530f. Sum word 532f, carry word 530f, and sum word 532e are provided as inputs to a full adder 526g, which produces a sum word 532g and a carry word 530g. The final full adder 408 sums the sum word 532g and the carry word 530g to produce the final product 214 . Although the parallel adder array 520 has a more complex layout than the linear adder array 502 shown in FIG. 5A, the parallel adder array 520 incurs fewer gate delays than the linear adder array 502 due to the ability of the parallel adder array 520 to perform certain additions in parallel with each other.
Referring to FIG. 5C, a functional block diagram is shown of another well-known adder array 540, referred to as a “Wallace tree” adder array, that may be used to implement the adder array 402. The Wallace tree adder array 540 sums the nine partial products 504a–i to produce the final product 214. The Wallace tree structure was first described in C.S. Wallace, “A Suggestion for a Fast Multiplier,” IEEE Transactions on Electronic Computers, vol. EC-13, pp. 14–17, February 1964.
The Wallace tree adder array 540 includes full adders 546a–g, each of which receives three binary numbers as inputs and produces a corresponding sum and carry word as outputs. In particular, partial products 504g–i, 504d–f, and 504a–c are provided in parallel as inputs to full adders 546a, 546b, and 546c, respectively. Full adders 546a–c sum their inputs to produce sum words 552a–c and carry words 550a–c, respectively.
Full adder 546d sums the carry word 550a, the sum word 552a, and the carry word 550b to produce sum word 552d and carry word 550d. Full adder 546e sums the sum word 552b, the carry word 550a, and the sum word 552c to produce sum word 552e and carry word 550e. Full adder 546f sums the carry word 550d, the sum word 552d, and the carry word 550e to produce sum word 552f and carry word 550f. Finally, full adder 546g sums carry word 550f, sum word 552f, and sum word 552e to produce sum word 552g and carry word 550g. 
The final full adder 408 sums the sum word 552g and the carry word 550g to produce the final product 214. The Wallace tree adder array 540 adds the partial products 504a–i with fewer gate delays (four) than either the linear adder array 502 (FIG. 5A) or the parallel adder array 520 (FIG. 5B). In general, the calculation time of the array 540 is in theory proportional to the logarithm of the number of bits in the multiplier 202. The structure of the Wallace tree adder array 540 is complex, however, making it difficult to lay out schematics for circuitry implementing a conventional Wallace tree quickly, easily, and efficiently. As a result of the Wallace tree's complex structure, circuits implementing Wallace trees tend to include long wiring paths. The signal propagation delays incurred on such paths can be significant enough to significantly mitigate or even negate the theoretical speed benefits of Wallace trees.