The multiplication of two N-bit operands is a fundamental operation in general purpose computer processors. To perform a longhand multiplication the first operand A is successively multiplied by each bit of the second operand B to create a partial product. The partial product is then shifted to assign the appropriate weight based on the weight of the corresponding digit of the second operand B. Finally, the shifted partial products are added together, i.e., accumulated, to form the final product D.
Several techniques have been devised to speed up and/or simplify the longhand multiplication described above. Usually, as explained by A. Booth, in "A Signed Binary Multiplication Technique", Quarterly Journal of Mechanics and Applied Mathematics, Vol. IV, pt. 2, pp. 236-240 (1951), an encoding scheme is performed on the bits of the B operand by means of a signed-digit-carry set (SDC) of, for example, +2, +1, 0, -1, and -2, to reduce the number of partial products to be accumulated by one-half. The accumulation is then performed in a ripple fashion through the use of full adders arranged in a carry save format. Each partial product bit still requires one full adder.
A complete 8.times.8 encoded binary multiplier is shown in FIG. 1 with the related Boolean equations in Table 1. FIG. 1 consists of four sections; encoder logic 10, ripple accumulator 20, negative B operand correction 30, and a carry propagate adder 40.
TABLE 1 ______________________________________ L1 LOGIC L1(K,J) = [XP2(J) * A(K-1)] + [XP1(J) * A(K)] + ##STR1## L2 LOGIC L2(2J) ##STR2## L3 LOGIC NEGA = TCA * A(7) P ##STR3## Q ##STR4## BIT(2J+1) ##STR5## BIT(2J) = MINUS(J) XDR Q MINUS(J+1) = MINUS(J) + P + Q L4,L5,L6,L7 LOGIC NEGB = TCB * B 7 L4(K) = [NEGB XDR SDC(B)] * [NEGB XDR A(K)] L5 ##STR6## L5(J) = XM1(J) + XM2(J) L7(J) = A(-1) = .0. ______________________________________
The operands are A(0-7) and B(0-7) and the product is D(0-15). HADD and FADD are conventional half adders and full adders respectively with carry (C) and sum (S) ouputs.
The inputs TCA and TCB indicate whether the A and B operands are two's complement (=1) or unsigned (=0). The symbols XP2, XP1, X0, XM1, and XM2 are used for the X(+2), X(+1), X(0), X(-1), and X(-2) encoded digits for clarity. "*" is the Boolean "AND", "+" is the Boolean "OR", and "XOR" is the Boolean "Exclusive-OR".
In Table 1, logic block L1 consists of logic to select X(+2), X(+1), X(0), X(-2) multiples of the A operand. The multiples of B are generated in the array by simple shifting, complementing, or masking operations. The L2 block generates the lost bit that results from the single shift used to generate an X(+2) or X(-2) multiple.
TABLE 2 ______________________________________ B(2J+1) B(2J) SDC(2J) MULTIPLIER SDC(2J+2) ______________________________________ .0. .0. .0. X.0. .0. .0. .0. 1 XP1 .0. .0. 1 .0. XP1 .0. .0. 1 1 XP2 .0. 1 .0. .0. XM2 1 1 .0. 1 XM1 1 1 1 .0. XM1 1 1 1 1 X.0. 1 ______________________________________
Logic block L3 incorporates the signed digit encoding set of Table 2 as proposed by Booth. Block L3 also performs an effective sign extension of each partial product to 16 bits. The signal Minus(J) indicates that at least one previous bit pair generated a negative partial product. This signal is combined with the signed digit multiplier of a bit pair to generate bit (2J) and bit (2J+1). These two signals are the effective sign extension for each partial product fully merged with the sign extensions of all previously generated partial products. Logic blocks L4 and L5 perform a correction for a negative B operand and for a signed digit carry out of the last multiplier bit pair. Blocks L6 and L7 perform a two's complement operation on the A operand for X(-1) and X(-2) signed digits.
The advantage of this ripple adder technique is that it is usually convenient to lay out the required circuitry in monolithic single chip form. However, this technique is generally quite slow because the partial product accumulation takes place in ripple fashion, and the worst case delay path must pass through all of the rows of full adders. For example, in the case of the 8.times.8 multiplier with encoding as shown in FIG. 1 which will have four 8-bit partial products, the worst case delay will be equivalent to four full adder delays. Without encoding, the worst case delay would be eight full adder delays. In the case of a 64.times.64 multiplier with encoding, the worst case delay would be thirty-two, and without encoding the maximum delay would be sixty-four full adder delays.