The present invention relates generally to binary multipliers and more specifically to an improved speed binary multiplier capable of multiplying signed and unsigned operands.
All modern fast binary multipliers utilize some variations of the basic partial-product generation technique first applied by Seymour Cray and commonly referred to as "combinatorial", "paper and pencil", or "flow-through". In its most common form the technique simply involves consecutive multiplications of a K-digit long operand A (multiplicand) by a single digit B(i) of the M-digit long operand B (multiplier) and then shifting the resultant partial product P(i) to the left by the number of places equal to the position of the digit B(i) in the multiplier. In this particular case it is assumed that the number of places the partial product is to be shifted is directly equal to i. The shifting operation is, in fact, equivalent to the multiplication of the multiplicand by the weight of the decimal (or binary) digit B(i).
After generating all M partial products, they are then consecutively summed to yield the final (M+K) digit-long final product of A and B. This technique, used for decimal number multiplication is also directly applicable to the principle of binary multiplication of two numbers A and B, their respective binary widths being K and M. The example of multiplication of such 4-bit operands A=0111=7 and B=0011=3 is given in Table 1.
TABLE 1 ______________________________________ "Paper-and-pencil" multiplication of two 4-bit operands. ______________________________________ MULTIPLICAND A: 0 1 1 1 = 7 MULTIPLIER B: 0 0 1 1 = 3 0 1 1 1 partial product 1 0 1 1 1 partial product 2 0 0 0 0 partial product 3 0 0 0 0 partial product 4 0 0 0 1 0 1 0 1 final product ______________________________________
The ability to multiply signed numbers is more difficult. In a two's-complement notation when the most significant bit is a zero it is designated as a positive number, whereas when the most significant bit is a 1 it is designated as a negative number. One way to perform multiplication of signed operands is to convert the negative numbers to their positive binary representation, multiply the positive or unsigned versions and attach the appropriate sign. If both the operands have the same sign, the unsigned product would be the product, since it is positive. If either of the operands were negative, the two's complement of the product must be performed.
An alternative to the conversion to an unsigned magnitude and reconversion of the final product is illustrated in Table 2.
TABLE 2 ______________________________________ Multiplication of Two's Complement Operands with Sign Extension. ______________________________________ 1 0 0 1 (Multiplicand) = -7 .times. 1 1 0 1 (Multiplier) = -3 Partial 1 0 0 1 Extended Sign = 1111001 Product #1 Partial 0 0 0 0 Extended Sign = 000000 Product #2 Partial 1 0 0 1 Extended Sign = 11001 Product #3 Partial 1 0 0 1 Two's Complement = 0111 Product #4 Final Product = 0010101 ______________________________________
The first three partial products are performed with sign extension. The fourth partial product, which is the sign bit, is converted to a two's complement notation before addition with the other partial products. This is to correct for the negative sign bit in combination with the sign extension.
As is apparent from Tables 1 and 2 besides some input and output reformatting of the operands and final product, the bulk of multiplication of time, even in its simplest form, is consumed by the M-1 additions required to generate the sum of partial products. In fact, all the algorithmic speed improvements brought into the design of parallel multipliers have involved the reduction of the number of additions necessary to generate the final product, as well as acceleration of the necessary additions (application of "carry-save" adders). The most common techniques used today employ algorithmic refinements of the basic concept described above; they are known as "Wallace Tree Partial Product Reduction" and "Modified Booth Algorithm".
Application of these two techniques combined leads to the potential reduction of the necessary number of partial product additions to one half the number of bits in the multiplier. Consequently, the amount of time necessary for the partial products to flow through the adder array is also cut in half. However, this is accomplished at the expense of using a relatively complex Booth decoder.
Booth algorithm, compared to the present invention, introduces not only extra delays caused by a more complex Booth Decoder, but also results in increased circuit size due to the need of propagating the sign extension through the CSA (Carry Save Adder) array. This also leads to poorer time performance. For example, in Table 1, partial products 1, 2 and 3 would include three, two and one sign extending bits, respectively.
Thus, using the example of Table 1, the Booth multiplication array increases generally quadratically with the number of partial products that must be performed, whereas the combinational multiplication array of Table 1 varies linearly with the number of bits.
The original Booth algorithm and the modified Booth algorithm involve searching for and determining strings of zeros or ones in the multiplier and performing addition and subtraction for the different partial products depending upon a determination of the beginning, end or middle of the string.
In combinatorial multiplication, a relative 1-digit shift always occurs between the multiplicand and the partial sum, regardless of whether an addition has occurred or not. Booth's algorithm permits more than one shift at a time, depending on the grouping of ones and zeros in the multiplier bit by bit, starting with the LSB, shifting the partial product relative to the multiplicand as each bit is examined. Subtract the multiplicand from the partial product when you find the first one in a string of ones. Similarly, upon finding the first zero in a string of zeros, add the multiplicand to the partial product. Perform no operation when the bit examined is identical to the previous multiplier bit.
A modified version of Booth's algorithm is more commonly used. The difference between the Booth's and the modified Booth's algorithm is as follows: The modified Booth always generates n/2 independent partial products, whereas the original Booth generates a varying (at most n/2) number dependent of partial products, depending on the bit pattern of the multiplier. Of course, parallel hardware implementation lends itself only to the fixed independent number of partial products. The modified multiplier encoding scheme encodes 2-bit groups and produces five partial products from an 8-bit multiplier, the fifth partial product being a consequence of the fact that the algorithm only handles two's complement numbers (only four partial products are generated if only two's complement representation is used).
State-of-the-art multipliers, such as those employed in DSP (Digital Signal Processing) architectures, should also be capable of performing accumulation of the products, as well as be capable of operating on both unsigned integers and two's complemented binary words. Also, in the case of more advanced circuits, the addition of both input operands may be required.
Thus, an object of the present invention is to provide a multiplier architecture that possesses all of the above capabilities and matches, if not exceeds, the speed performance of the similar circuits designed around combined Booth algorithm/Wallace tree reduction schemes.
Another object of the present invention is to provide a multiplier architecture which is capable of 4.times.4, 8.times.8, 16.times.16 and other capacities while maintaining the desired speed characteristics.
A still further object of the present invention is to provide a multiplier architecture whose CSA array size increases almost linearly with the increase of operand widths as compared to quadratic growth in size of Booth architectures.
An even further object of the present invention is to provide the capability of multiplying signed and unsigned numbers without increasing the amount of time over that of multiplying only unsigned numbers.
These and other objects of the invention are attained by a multiplier architecture that reduces the number of partial product additions by performing an unsigned binary multiplication whenever the number of 1's in the multiplier is less than or equal to half the multiplier's binary width and negative or two's complement multiplication of the operands whenever the number of 1's in the multiplier exceeds half of its binary width.
The ability to handle signed and unsigned numbers while not increasing the time it takes to perform the multiplication results from determining the need for and calculate a correction factor in parallel with the multiplication scheme. Multiplication is performed for the two operands as if they were unsigned numbers without any conversion. If the first operand is negative, a first correction, which is the two's complement of the second operand, is added to the product. If the second operand is negative, a second correction, which is the two's complement of the first operand, is added to the product. If both operands are negative, the two's complement of both operands are added to the product.
The architecture includes circuitry for determining when to take a two's complement, first and second complementers, a shifter, a signed operand corrector and an adder. The determining circuit determines and provides a two's complement signal when the multiplier has 1's in more than half of its bits. The first complementer provides a multiplicand vector as a two's complement of the multiplicand in response to the two's complement signal or the multiplicand in the absence of the two's complement signal. The second complementer provides shift control signals as a function of the two's complement of the multiplier in response to the two's complement control signal or as a function of the multiplier in the absence of the two's complement control signal. A shifter circuit provides a plurality of shifted multiplicand vectors as a function of the shift control signals from the second complementer. The sign operand corrector circuit is in parallel to the shifter circuit and provides the two's complement of one of the operands if the other operand is negative. The adder adds the multiplicand, the multiplier, the corrections, and the plurality of shifted multiplicand vectors in response to the two's complement signal, or adds only the corrections and the plurality of shifted multiplicand vectors in the absence of the two's complement signal to produce a product. The corrections are generated only if one of the operands is signed or negative.
The shifter circuit includes a merging circuit for merging bits of the operands with the shifted multiplicand vectors in response to the complement signal. Logic is provided for selecting which bit of which shifted multiplicand vector the bits of the operand are merged as a function of the multiplier. Portions of the bits of the operand are assigned to particular shifted multiplicand vectors and the remaining bits of the operand are assigned by the logic as a function of the multiplier. One of the operands is merged in the shifters with the other operand being a direct input to the adder circuit.
A 4/K architecture is used wherein K bit long multiplicand A is multiplied in parallel by 4-bit slices of the multiplier and the result in partial vectors are summed in carry-save arrays.
Other objects, advantages and novel features of the present invention will become apparent from the following detailed description of the invention when considered in conjunction with the accompanying drawings.