1. Field of the Invention
The present invention relates to the field of math processors in computers, and more particularly, to Booth multipliers used in math processors to perform high speed multiplication of numbers.
2. Description of Related Art
One of the primary functions of most computer systems is to perform a large number of mathematical operations at a speed much faster than a human being could perform the operations. Since a computer devotes a considerable amount of its processing time to performing mathematical operations, an improvement in the speed of a math processor of the computer for performing a particular type of operation will increase the overall speed of the computer.
A known method of performing multiplication in a math processor is by array multiplication using a parallel multiplier. The parallel multiplication process is based on the fact that partial products in multiplication can be independently computed in parallel. An example of multiplication by partial products is shown below in Table 1 for two 4-bit numbers.
TABLE 1 __________________________________________________________________________ 4-bit Multiplier Partial Products __________________________________________________________________________ X3 X2 X1 X0 Multiplicand Y3 Y2 Y1 Y0 Multiplier __________________________________________________________________________ X3Y0 X2Y0 X1Y0 X0Y0 X3Y1 X2Y1 X1Y1 X0Y1 X3Y2 X2Y2 X1Y2 X0Y2 X3Y3 X2Y3 X1Y3 X0Y3 P7 P6 P5 P4 P3 P2 P1 P0 Product __________________________________________________________________________
A parallel multiplier is normally implemented as a square array of adders. In what is known as a Radix-2 scheme, the partial products are computed by observing one bit of the multiplier at a time. A higher radix multiplier, such as a Radix-4 multiplier, or a "Booth recoding multiplier", reduces the number of adders (and therefore the delay required to produce the partial sums) by examining a plurality of bits at a time. In conventional Booth recoding, the multiplier bits are divided into two-bit pairs, and a total of three bits are scanned at a time. These three bits are: the two bits from the present pair; and a third bit from the high order bit of an adjacent lower-order pair. After examining each triplet of bits, Booth recoding logic converts the triplet into a set of five signed digits 0, +1, +2, -1, and -2. Each recoded digit performs only a simplified processing on the multiplicand, such as add, subtract, or shift.
The speed of the Booth multiplier is limited by the number of rows of adders in the array. For example, a conventional 16.times.16 Booth multiplier such as shown in FIG. 1 will have an array containing eight rows of adders. There will therefore be a total delay of at least 8 adder delays (8Tadd, where Tadd is the delay of one adder) before the addition results of all of the adders are generated. This total delay of the array does not take into account, however, the further delays involved in adding together the final addition results (sum and carry) from the adders to generate the final product of the multiplication.
The addition results are normally added on the right-hand side of each row of the Booth multiplier by a plurality of two-bit adder circuits, and from the bottom row of adders by a carry select adder forming the output adder. A known Booth multiplier uses two 2-bit adders for each row (or "stage") of the array as shown in FIG. 1. These 2-bit adders are connected to the two right-most adders in a row and receive four bits which are added, with two bits of the final product and a carry out being produced by the adders. The two 2-bit adders for the eighth (and bottom) row of the array receive the addition results from the array after 8 adder delays (8Tadd). The carry-out from this final pair of 2-bit adders is provided after another adder delay to a carry-in input of the carry select adder. Thus, the carry select adder does not receive the carry-in until 9 adder delays (9Tadd) after the multiplier and the multiplicand entered the array.
A known 15-bit carry select adder is composed of a 3-bit carry lookahead adder for the 3 least significant bits, followed by three 4-bit carry select adders. The delay of the 15-bit carry select adder is equal to the delay from any input to the carry-out of the 3-bit carry lookahead adder (Tadd3) plus 3(Tmux), where Tmux is the delay from a select input to a multiplexer output of the three 4-bit carry select adders. If Tadd3 is equal to one of the adder delays of the array adders (Tadd3=Tadd), and Tmux=0.5Tadd, then the delay for the 15-bit carry select adder is 2.5 Tadd. Since the 15-bit carry select adder only receives the carry-in after 9 adder delays in the conventional design, the additional 2.5 adder delay introduced by the 15-bit carry select adder causes the total delay from the inputs of the multiplier and the multiplicand to the final product to be 11.5 adder delays (11.5 Tadd).
The savings of even one adder delay would provide a significant improvement in the speed of a multiplier. However, since a conventional array for multiplying 16-bit numbers will necessarily have eight rows of adders producing 8 adder delays, there is little room for improvement in the overall speed (11.5 adder delays) of the multiplier without increasing the speed of the individual adders of the array and the power consumption of the multiplier.