1. Technical Field
The present application relates generally to an improved Booth decoder apparatus and method. More specifically, the present application is directed to a Booth decoder circuit that minimizes the delay elements of the Booth decoder circuit thereby increasing the speed of the Booth decoder circuit operation.
2. Description of Related Art
In 1951, Andrew D. Booth, while doing research on crystallography at Birkbeck College in Bloomsbury, London invented an algorithm for performing multiplication of two signed numbers in two's complement notation. Booth used desk calculators that were faster at shifting than adding and created the algorithm to increase their speed.
Booth's multiplication algorithm may be described as follows. If x is the count of bits of the multiplicand, i.e. a quantity that is multiplied by another quantity (the multiplier), and y is the count of bits of the multiplier:
(1) Draw a grid of three lines, each with squares for x+y+1 bits. Label the lines respectively A (add), S (subtract), and P (product);
(2) In two's complement notation, fill the first x bits of each line with:                A: the multiplicand        S: the negative of the multiplicand        P: zeros        
(3) Fill the next y bits of each line with:                A: zeros        S: zeros        P: the multiplier        
(4) Fill the last bit of each line with a zero.
(5) Do the following two steps |y| (Absolute value of y) times:                a) If the last two bits in the product are:                    00 or 11: do nothing            01: P=P+A. Ignore any overflow.            10: P=P+S. Ignore any overflow.                        b) Arithmetically shift the product right one position.        
(6) Drop the last bit from the product for the final result.
The following is an example of the implementation of Booth's multiplication algorithm. Assume that one wants to find the result of 3×−4, where 3 is the multiplicand and −4 is the multiplier. Performing steps 1-4 of the Booth multiplication algorithm, the result achieved is as follows:
A = 0011 0000 0S = 1101 0000 0P = 0000 1100 0
Performing the fifth step of Booth's algorithm requires four iterations through the loop as follows:
P = 0000 1100 0. The last two bits are 00.P = 0000 0110 0. A right shift.***end of first iteration***P = 0000 0110 0. The last two bits are 00.P = 0000 0011 0. A right shift.***end of second iteration***P = 0000 0011 0. The last two bits are 10.P = 1101 0011 0. P = P + S.P = 1110 1001 1. A right shift.***end of third iteration***P = 1110 1001 1. The last two bits are 11.P = 1111 0100 1. A right shift.
Thus, the product of 3×−4 is 1111 0100, which is equal to −12.
In order to understand why Booth's multiplication algorithm works, consider a positive multiplier consisting of a block of is surrounded by 0s, e.g., 00111110. The product is given by:M×“00111110”=M×(25+24+23+22+21)=M×62where M is the multiplicand. The number of operations can be reduced to two by rewriting the same product as:M×“010000-10”=M×(26−21)=M×62
The product can be then generated by one addition and one subtraction of the multiplicand. This scheme can be extended to any number of blocks of is in a multiplier, including the case of a single 1 in a block.
Thus, Booth's multiplication algorithm follows this scheme by performing an addition when it encounters the first digit of a block of ones (01) and a subtraction when it encounters the end of the block of ones (10). This works for a negative multiplier as well. When the ones in a multiplier are grouped into long blocks, Booth's algorithm performs fewer additions and subtractions than a normal multiplication algorithm.
With regard to computer architecture, the Booth multiplication algorithm is a technique that allows for smaller, faster multiplication circuits in computing devices, by recoding the numbers that are multiplied. It is the standard technique used in chip design, and provides significant improvements over the “long multiplication” technique.
The standard “long multiplication” technique involves performing, for each column in the multiplier, a shift of the multiplicand by an appropriate number of columns and multiplying it by a value of the digit in that column of the multiplier to obtain a partial product. The partial products may then be added to obtain the final result. With such a system, the number of partial products is exactly the number of columns in the multiplier.
The number of partial products may be reduced by half by using a technique known as radix 4 Booth recoding. The basic idea is that instead of shifting and adding for every column of the multiplier term and multiplying by 1 or 0, every second column is taken and multiplied by ±1, ±2, or 0 to obtain the same results. Thus, to multiply by 7, one can multiply the partial product aligned against the least significant bit by −1, and multiply the partial product aligned with the third column by 2:Partial Product 0=Multiplicand*−1, shifted left 0 bits (x −1).Partial Product 1=Multiplicand*2, shifted left 2 bits (x 8).
This is the same result as the equivalent shift and add method as shown below:Partial Product 0=Multiplicand*1, shifted left 0 bits (x 1).Partial Product 1=Multiplicand*1, shifted left 1 bit (x 2).Partial Product 2=Multiplicand*1, shifted left 2 bits (x 4).Partial Product 3=Multiplicand*0, shifted left 3 bits (x 0).
The halving of the number of partial products is important in circuit design as it relates to the propagation delay in the running of the circuit as well as the complexity and power consumption of the circuits.
Moreover, it is also important to note that there is comparatively little complexity penalty in multiplying by 0, 1 or 2. All that is needed is a multiplexer, or the equivalent, which has a delay time that is independent of the size of the inputs. Negating 2's complement numbers has the added complication of needing to add a “1” to the least significant bit, but this can be overcome by adding a single correction term with the necessary “1”s in the correct positions.
To Booth recode the multiplier term, the bits of the multiplier term are considered in blocks of three such that each block overlaps the previous block by one bit, as shown in FIG. 1A. Grouping of bits starts from the least significant bit with the first block 105 only using two bits of the multiplier, since there is no previous block to overlap. The overlap of the blocks 105-145 is necessary so that it can be known what happened in the last block, as the most significant bit of the block acts like a sign bit. Since the least significant bit of each block is used to know what the sign bit was in the previous block, and there are never any negative products before the least significant block, the least significant bit of the first block 105 is always assumed to be 0.
After having grouped the bits into three-bit blocks, the Booth decoder truth table shown in FIG. 1B is then consulted to determine what the encoding will be for each block. In the Booth decoder truth table of FIG. 1B, the multiplicand is B and the multiplier is A (thus, the truth table is for multiplication of B*A). For each iteration of Booth recoding, the three-bit blocks of the multiplier are used to generate a partial product. For example, when the three-bit block is “010”, the partial product is +1B, i.e. +1*Multiplicand, as shown in the second column of FIG. 1B. Each of the three-bit blocks of the multiplier are used to generate partial products which are then added to obtain the resulting value of the multiplication operation.
FIG. 2 illustrates a known multiplier circuit arrangement for realizing Booth's multiplication algorithm and which utilizes the three-bit blocks and truth table described above. This multiplier circuit is described in U.S. Pat. No. 5,748,517, which is hereby incorporated by reference.
As shown in FIG. 2, Booth decoders BD1-BD3 receive overlapping three bits of a 6-bit multiplier Y (Y0-Y5), respectively. That is to say, the Booth decoder BD1 receives “0”, Y0, Y1, the Booth decoder BD2 receives Y1, Y2, Y3, and the Booth decoder BD3 receives Y3, Y4, Y5. The Booth decoders BD1-BD3 output partial product information groups S1-S5 to partial product generating circuits PP1-PP3 on the basis of the received three bits of the multiplier Y, respectively.
The partial product generating circuits PP1-PP3 receive the partial product information groups S1-S5 from the Booth decoders BD1-BD3, respectively, and an 8-bit multiplicand X (X0-X7). The partial product generating circuits PP1-PP3 output partial products SM1-SM3 to a partial product adder circuit ADD1. The partial product adder circuit ADD1 adds SM1-SM3 to output a multiplication result XY of the multiplier Y and the multiplicand X.
FIG. 3 is a circuit diagram showing the internal configuration of the Booth decoders BD1-BD3 of the multiplication circuit shown in FIG. 2. As shown in FIG. 3, the least significant multiplier Y2i−1 corresponds to 0, Y1, Y3 of the Booth decoders BD1-BD3, the intermediate multiplier Y2i corresponds to Y0, Y2, Y4 of the Booth decoders BD1-BD3, and the most significant multiplier Y2i+1 corresponds to Y1, Y3, Y5 of the Booth decoders BD1-BD3.
An AND gate 50 receives the least significant multiplier Y2i−1, the intermediate multiplier Y2i and the most significant multiplier Y2i+1 as inputs and provides a result of the logic operation as an output to an OR gate 51. A NOR gate 52 receives the least significant multiplier Y2i−1, the intermediate multiplier Y2i and the most significant multiplier Y2i+1 as inputs and outputs a result of the logic operation to the OR gate 51. The output of the OR gate 51 becomes the partial product information S1.
A NOR gate 53 receives the least significant multiplier Y2i−1 and the intermediate multiplier Y2i and outputs a result of the logic operation to a NAND gate 54. A NAND gate 56 receives the least significant multiplier Y2i−1 and the intermediate multiplier Y2i and outputs a result of the logic operation to a NOR gate 57. The NAND gate 54 further receives the most significant multiplier Y2i+1 and outputs a result of the logic operation as an inversion partial product information S2, and also outputs the partial product information S2 through an inverter 55. The NOR gate 57 further receives the most significant multiplier Y2i+1 and outputs a result of the logic operation as the partial product information S3, and also outputs the inversion partial product information S3 through an inverter 58.
An XOR gate 61 receives the least significant multiplier Y2i−1 and the intermediate multiplier Y2i and outputs a result of the logic operation to a NAND gate 59 and a NAND gate 63. The NAND gate 59 further receives the most significant multiplier Y2i+1 and outputs a result of the logic operation as the inversion partial product information S4, and also outputs the partial product information S4 through an inverter 60. The NAND gate 63 further receives the intermediate multiplier Y2i through an inverter 62 and outputs a result of the logic operation as the inversion partial product information S5 and also outputs the partial product information S5 through an inverter 64.
In this common static circuit design shown in FIG. 3, the Booth decoder circuitry is relatively large, requiring a large number of transistors, and slow operating. It would be beneficial to be able to reduce the size of the Booth decoder circuitry and increase the speed at which it operates.