The present invention relates generally to binary multipliers and more specifically to an improved speed binary multiplier
All modern fast binary multipliers utilize some variations of the basic partial-product generation technique first applied by Seymour Cray and commonly referred to as "combinatorial", "paper and pencil", or "flow-through". In its most common form the technique simply involves consecutive multiplications of a K-digit M-digit long operand B (multiplier) and then shifting the resultant partial product P(i) to the left by the number of places equal to the position of the digit B(i) in the multiplier. In this particular case it is assumed that the number of places the partial product is to be shifted is directly equal to i. The shifting operation is, in fact, equivalent to the multiplication of the multiplicand by the weight of the decimal (or binary) digit
After generating all M partial products, they are then consecutively summed to yield the final (M+K) digit-long final product of A and B. This technique, used for decimal number multiplication is also directly applicable to the principle of binary multiplication of two numbers A and B, their respective binary widths being K and M. The example of multiplication of such 4-bit operands A=1010 and B=0110 is given in Table 1.
TABLE 1 ______________________________________ "Paper-and-pencil" multiplication of two 4-bit operands. ______________________________________ MULTIPLICAND A: .sup. 1 0 1 0 LSB MULTIPLIER B: .sup. 0 1 1 0 MSB 0 0 0 0 partial product 1 1 0 1 0 partial product 2 1 0 1 0 partial product 3 0 0 0 0 partial product 4 0 0 0 1 1 1 0 0 final product ______________________________________
As in apparent from Table 1 besides some input and output reformatting of the operands and final product, the bulk of multiplication of time, even in its simplest form, is consumed by the M-1 additions required to generate the sum of partial products. In fact, all the algorithmic speed improvements brought into the design of parallel multipliers have involved the reduction of the number of additions necessary to generate the final product, as well as acceleration of the necessary additions (application of "carry-save" adders). The most common techniques used today employ algorithmic refinements of the basic concept described above; they are known as "Wallace Tree Partial Product Reduction" and "Modified Booth Algorithm".
Application of these two techniques combined leads to the potential reduction of the necessary number of partial product additions to one half the number of bits in the multiplier. Consequently, the amount of time necessary for the partial products to flow through the adder array is also cut in half. However, this is accomplished at the expense of using a relatively complex Booth decoder.
Booth algorithms, compared to the present invention, introduces not only extra delays caused by a more complex Booth Decoder, but also results in increased circuit size due to the need of propagating the sign extension through the CSA (Carry Save Adder) array. This also leads to poorer time performance. For example, in Table 1, partial products 1, 2 and 3 would include three, two and one sign extending bits, respectively.
Thus, using the example of Table 1, the Booth multiplication increases generally quadratically with the number of partial products that must be performed, whereas the comb.:national multiplication of Table 1 varies linearly with the number of bits.
The original Booth algorithm and the modified Booth algorithm involve searching for and determining strings of zeros or ones in the multiplier and performing addition and subtraction for the different partial products depending upon a determination of the beginning, end or middle of the string.
In combinatorial multiplication, a relative 1-digit shift always occurs between the multiplicand and the partial sum, regardless of whether an addition has occurred or not. Booth's algorithm permits more than one shift at a time, depending on the grouping of ones and zeros in the multiplier bit by bit, starting with the LSB, shifting the partial product relative to the multiplicand as each bit is examined. Subtract the multiplicand from the partial product when you find the first one in a string of ones. Similarly, upon finding the first zero in a string of zeros, add the multiplicand to the partial product. Perform no operation when the bit examined is identical to the previous multiplier bit.
The logic of Booth's algorithm is as follows: Any binary number comprising a string of ones, such as 111, equals the next larger binary number (1000 in this case) minus 1. Therefore, 111=1000-1; 11=100-1; and so on. In actual multidigit numbers, the beginning and end of each string of ones are marked by transitions from zero to one and one to zero, respectively. In Booth's algorithm, every string of ones in the multiplier is handled by a multiply-subtract operation at the beginning of the string and a multiply-add operation at the end, regardless of the string's length. Thus, the larger the string, the greater the saving if you use the algorithm.
A modified version of Booth's algorithm is more commonly used. The difference between the Booth's and the modified Booth's algorithm is as follows: The modified Booth always generates n/2 independent partial products, whereas the original Booth generates a varying (at most n/2) number dependent of partial products, depending on the bit pattern of the multiplier. Of course, parallel hardware implementation lends itself only to the fixed independent number of partial products. The modified multiplier encoding scheme encodes 2-bit groups and produces five partial products from an 8-bit multiplier, the fifth partial product being a consequence of the fact that the algorithm only handles two's complement numbers (only four partial products are generated if only two's complement representation is used).
Each multiplier is divided into substrings of 3 bits, with adjacent groups sharing a common bit. Booth's algorithm can be used with either unsigned or two's complement numbers, (the most significant bit of which has a weight of -2.sup.n), and requires that the multiplier be padded with a 0 to the right to form four complete groups of 3 bits each. To work with unsigned numbers, the n-bit multiplier must also be padded with one or two zeros in the multipliers to the left. Table 2 is the encoding table of the eight permutations of the 3 multiplier bits.
TABLE 2 ______________________________________ Encoding the 3 multiplier bits, in the modified Booth's algorithm. Bit 2.sup.1 2.sup.0 2.sup.-1 Y.sub.i+1 Y.sub.i Y.sub.i-1 Operation ______________________________________ 0 0 0 add zero (no string) +0 0 0 1 add multiplicand (end of string) +X 0 1 0 add multiplicand (a string) +X 0 1 1 add twice the multiplicand (end of string) +2X 1 0 0 subtract twice the multiplicand -2X (beginning of string) 1 0 1 subtract the multiplicand (-2X and +X) -X 1 1 0 subtract the multiplicand (beginning of -X string) 1 1 1 subtract zero (center of string) -0 ______________________________________
Thus, the modified Booth is a series of additions or subtractions depending upon the particular 3 multiplier bit code.
State-of-the-art multipliers, such as those employed in DSP (Digital Signal Processing) architectures, should also be capable of performing accumulation of the products, as well as be capable of operating on both unsigned integers and two's complemented binary words. Also, in the case of more advanced circuits, the addition of both input operands may be required.
Thus, an object of the present invention is to provide a multiplier architecture that possesses all of the above capabilities and matches, if not exceeds, the speed performance of the similar circuits designed around combined Booth algorithm/Wallace tree reduction schemes.
Another object of the present invention is to provide a multiplier architecture which is capable of 4.times.4, 8.times.8, 16.times.16 and other capacities while maintaining the desired speed characteristics.
A still further object of the present invention is to provide a multiplier architecture whose chip size increases almost linearly with the increase of operand widths as compared to quadratic growth in size of Booth architectures.
These and other objects of the invention are attained by a multiplier architecture that reduces the number of partial product additions by performing an unsigned binary multiplication whenever the number of 1's in the multiplier is less than or equal to half the multiplier's binary width and negative or two's complement multiplication of the operands whenever the number of 1's in the multiplier exceeds half of its binary width.
The architecture includes circuitry for determining when to take a two's complement, first and second complementers, a shifter, and an adder. The determining circuit determines and provides a two's complement signal when the multiplier has 1's in more than half of its bits. The first complementer provides a multiplicand vector as a two's complement of the multiplicand in response to the two's complement signal or the multiplicand in the absence of the two's complement signal. The second complementer provides shift control signals as a function of the two's complement of the multiplier in response to the two's complement control signal or as a function of the multiplier in the absence of the two's complement control signal. A shifter circuit provides a plurality of shifted multiplicand vectors as a function of the shift control signals from the second complementer. The adder adds the multiplicand, the multiplier, and the plurality of shifted multiplicand vectors in response to the two's complement signal, or adds only the plurality of shifted multiplicand vectors in the absence of the two's complement signal to produce a product.
The shifter circuit includes a merging circuit for merging bits of the operands with the shifted multiplicand vectors in response to the complement signal. Logic is provided for selecting which bit of which shifted multiplicand vector the bits of the operand are merged as a function of the multiplier. Portions of the bits of the operand are assigned to particular shifted multiplicand vectors and the remaining bits of the operand are assigned by the logic as a function of the multiplier. The adder circuit may include an operand adder to add the operands in parallel with the complementers and provide a portion of the operand sum to be merged with the multiplication vectors and the remainder of the operand bits being provided to a final adder which also receives the shifted multiplicand vectors. Alternatively and preferably in the larger bit operands, one of the operands is merged in the shifters with the other operand being a direct input to the adder circuit.
Carry lookahead at the product or final adder may be performed by logic at the final adder as a function of the plurality of shifted multiplicand vectors or as a function of the multiplicand vectors and shift control signals prepared at the second complementer. Pipelining may also be provided.
Although the structure is built around a 4.times.4 architecture, this may be expanded in a 4/K architecture wherein K bit long multiplicand A is multiplied in parallel by 4-bit slices of the multiplier and the result in partial vectors are summed in carry-save arrays.
Other objects, advantages and novel features of the present invention will become apparent from the following detailed description of the invention when considered in conjunction with the accompanying drawings.