1. Technical Field
The present application relates generally to an improved multiplier engine apparatus and method. More specifically, the present application is directed to a multiplier engine that reduces the size of the circuitry used to provide the multiplier engine as well as increases the speed at which the multiplication algorithm is performed.
2. Description of Related Art
In 1951, Andrew D. Booth, while doing research on crystallography at Birkbeck College in Bloomsbury, London invented an algorithm for performing multiplication of two signed numbers in two's complement notation. Booth used desk calculators that were faster at shifting than adding and created the algorithm to increase their speed.
Booth's multiplication algorithm may be described as follows. If x is the count of bits of the multiplicand, i.e. a quantity that is multiplied by another quantity (the multiplier), and y is the count of bits of the multiplier:
(1) Draw a grid of three lines, each with squares for x+y+1 bits. Label the lines respectively A (add), S (subtract), and P (product);
(2) In two's complement notation, fill the first x bits of each line with:                A: the multiplicand        S: the negative of the multiplicand        P: zeros        
(3) Fill the next y bits of each line with:                A: zeros        S: zeros        P: the multiplier        
(4) Fill the last bit of each line with a zero.
(5) Do the following two steps |y| (Absolute value of y) times:                a) If the last two bits in the product are:                    00 or 11: do nothing            01: P=P+A. Ignore any overflow.            10: P=P+S. Ignore any overflow.                        b) Arithmetically shift the product right one position.        
(6) Drop the last bit from the product for the final result.
The following is an example of the implementation of Booth's multiplication algorithm. Assume that one wants to find the result of 3×−4, where 3 is the multiplicand and −4 is the multiplier. Performing steps 1-4 of the Booth multiplication algorithm, the result achieved is as follows:                A=0011 0000 0        S=1101 0000 0        P=0000 1100 0        
Performing the fifth step of Booth's algorithm requires four iterations through the loop as follows:                P=0000 1100 0. The last two bits are 00.        P=0000 0110 0. A right shift.        ***end of first iteration***        P=0000 0110 0. The last two bits are 00.        P=0000 0011 0. A right shift.        ***end of second iteration***        P=0000 0011 0. The last two bits are 10.        P=1101 0011 0. P=P+S.        P=1110 1001 1. A right shift.        ***end of third iteration***        P=1110 1001 1. The last two bits are 11.        P=1111 0100 1. A right shift.Thus, the product of 3×−4 is 1111 0100, which is equal to −12.        
In order to understand why Booth's multiplication algorithm works, consider a positive multiplier consisting of a block of 1s surrounded by 0s, e.g., 00111110. The product is given by:M×“00111110”=M×(25+24+23+22+21)=M×62where M is the multiplicand. The number of operations can be reduced to two by rewriting the same product as:M×“010000-10”=M×(26−21)=M×62
The product can then be generated by one addition and one subtraction of the multiplicand. This scheme can be extended to any number of blocks of 1s in a multiplier, including the case of a single 1 in a block.
Thus, Booth's multiplication algorithm follows this scheme by performing an addition when it encounters the first digit of a block of ones (01) and a subtraction when it encounters the end of the block of ones (10). This works for a negative multiplier as well. When the ones in a multiplier are grouped into long blocks, Booth's algorithm performs fewer additions and subtractions than a normal multiplication algorithm.
With regard to computer architecture, the Booth multiplication algorithm is a technique that allows for smaller, faster multiplication circuits in computing devices, by recoding the numbers that are multiplied. It is the standard technique used in chip design and provides significant improvements over the “long multiplication” technique.
The standard “long multiplication” technique involves performing, for each column in the multiplier, a shift of the multiplicand by an appropriate number of columns and multiplying it by a value of the digit in that column of the multiplier to obtain a partial product. The partial products may then be added to obtain the final result. With such a system, the number of partial products is exactly the number of columns in the multiplier.
The number of partial products may be reduced by one half by using a technique known as radix 4 Booth recoding. The basic idea is that instead of shifting and adding for every column of the multiplier term and multiplying by 1 or 0, every second column is taken and multiplied by ±1, ±2, or 0 to obtain the same results. Thus, to multiply by 7, one can multiply the partial product aligned against the least significant bit by −1, and multiply the partial product aligned with the third column by 2:
Partial Product 0=Multiplicand*−1, shifted left 0 bits (x −1).
Partial Product 1=Multiplicand*2, shifted left 2 bits (x 8).
This is the same result as the equivalent shift and add method as shown below:
Partial Product 0=Multiplicand*1, shifted left 0 bits (x 1).
Partial Product 1=Multiplicand*1, shifted left 1 bit (x 2).
Partial Product 2=Multiplicand*1, shifted left 2 bits (x 4).
Partial Product 3=Multiplicand*0, shifted left 3 bits (x 0).
The halving of the number of partial products is important in circuit design as it relates to the propagation delay in the running of the circuit as well as the complexity and power consumption of the circuits.
Moreover, it is also important to note that there is comparatively little complexity penalty in multiplying by 0, 1 or 2. All that is needed is a multiplexer, or the equivalent, which has a delay time that is independent of the size of the inputs. Negating two's complement numbers has the added complication of needing to add a “1” to the least significant bit, but this can be overcome by adding a single correction term with the necessary “1”s in the correct positions.
To Booth recode the multiplier term, the bits of the multiplier term are considered in blocks of three such that each block overlaps the previous block by one bit, as shown in FIG. 1A. Grouping of bits starts from the least significant bit with the first block 105 only using two bits of the multiplier, since there is no previous block to overlap. The overlap of the blocks 105-145 is necessary so that it can be known what happened in the last block, as the most significant bit of the block acts like a sign bit. Since the least significant bit of each block is used to know what the sign bit was in the previous block, and there are never any negative products before the least significant block, the least significant bit of the first block 105 is always assumed to be 0.
After having grouped the bits into three-bit blocks, the Booth decoder truth table shown in FIG. 1B is then consulted to determine what the encoding will be for each block. In the Booth decoder truth table of FIG. 1B, the multiplicand is B and the multiplier is A (thus, the truth table is for multiplication of B*A). For each iteration of Booth recoding, the three-bit blocks of the multiplier are used to generate a partial product. For example, when the three-bit block is “010”, the partial product is +1B, i.e. +1*Multiplicand, as shown in the second column of FIG. 1B. Each of the three-bit blocks of the multiplier are used to generate partial products which are then added to obtain the resulting value of the multiplication operation.
FIG. 2 illustrates a known multiplier circuit arrangement for realizing Booth's multiplication algorithm and which utilizes the three-bit blocks and truth table described above. This multiplier circuit is described in U.S. Pat. No. 5,748,517, which is hereby incorporated by reference.
As shown in FIG. 2, Booth decoders BD1-BD3 receive overlapping three bits of a 6-bit multiplier Y (Y0-Y5), respectively. That is to say, the Booth decoder BD1 receives “0”, Y0, Y1, the Booth decoder BD2 receives Y1, Y2, Y3, and the Booth decoder BD3 receives Y3, Y4, and Y5. The Booth decoders BD1-BD3 output partial product information groups S1-S5 to partial product generating circuits PP1-PP3 on the basis of the received three bits of the multiplier Y, respectively.
The partial product generating circuits PP1-PP3 receive the partial product information groups S1-S5 from the Booth decoders BD1-BD3, respectively, and an 8-bit multiplicand X (I0-I7). The partial product generating circuits PP1-PP3 output partial products SM1-SM3 to a partial product adder circuit ADD1. The partial product adder circuit ADD1 adds SM1-SM3 to output a multiplication result XY of the multiplier Y and the multiplicand X.
The partial product adder circuit ADD1 must be of sufficient size as to permit the adding of the outputs SM1-SM3 of the partial product generation circuits PP1-PP3. Because the partial product adder circuit ADD1 must account for the possibility that the partial products may be negative, negate bits must be included in the addition performed by the partial product adder circuit ADD1. As a result, the partial product adder circuit ADD1 has an increased size to accommodate the negate bits. This increase in size further causes the partial product adder circuit ADD1 to be relatively slow.
To illustrate this problem in known adder circuits, consider a M*N bit Booth integer multiplier, where in this case the value for N is 8 bits. As discussed above, in order to perform the Booth multiplication, one must generate the 0, +/−1B, and +/−2B terms, where B is the multiplicand. In two's complement binary representation, the −1B and −2B terms are generated by bitwise inversion plus 1 at the least significant bit. For example:
                              5          =          0101                ⁢                                                              -            5                    =          1010                ⁢                                                  +          1                ⁢                                                  =          1011                ⁢                    
In an M*8 Booth multiplication, the 4 partial products have the format as shown in FIG. 3. These 4 partial products are referred to as SM1, SM2, SM3, and SM4 and are generated by corresponding partial product generating circuits PP1-PP4. The 4 partial products have B+1 bits, where B is the bit size of the multiplicand. The 4 negate bits N1, N2, N3 and N4 are associated with these 4 partial products SM1-SM4, respectively. The N1 is placed at bit 0 position. Its value is N1*21. Similarly, N2, N3 and N4 are placed at the bit 2, 4 and 6 positions. Their values are N2*22, N3*23, and N4*24, respectively. If a partial product is a positive term, such as 0, +1B, or +2B, the negate bit N will be 0. If a partial product is a negative term, such as −1B or −2B, the negate bit will be 1.
It should be noted that the greatest number of terms to be summed are the 5 terms at the bit 6 position in FIG. 3. These 5 terms need to be summed by an adder circuit when generating the multiplication result, e.g., by partial product adder circuit ADD1. In order to perform such summing of the 5 terms, a 5:2 compressor is used to generate carry and sum terms. The 5:2 compressor circuit has a configuration as shown in FIG. 4.
As shown in FIG. 4, the 5:2 compressor circuit requires 3 full adders 410-430 to handle all 5 inputs. The first full adder 410 adds the first partial product SM1, the second partial product SM2, and the third partial product SM3 (which is the carry in value for the full adder). The first full adder 410 generates a sum value that is output to the second full adder 420 and a carry-out value side_cout1 which is input to the third full adder 430 as side_cin1. The second full adder 420 adds the sum from the first full adder 410 with the fourth partial product SM4 and the fourth negate value N4. The second full adder 420 generates a sum value that is output to the third full adder 430 and a carry-out value side_cout2 which is input to the third full adder 430 as side_cin2. The third full adder 430 adds the sum from the second full adder 420 with the first carry-out value and the second carry-out value and generates a multiplication result sum value and result carry-out value.
The circuit shown in FIG. 4 is relatively large and slow operating. It would be beneficial to be able to reduce the size of the adder circuitry as well as increase the speed by which the addition of the partial products to generate the multiplication result is performed.