Not Applicable
The present invention pertains to digital multipliers, and, more particularly, to parallel digital multipliers for multiplying signed numbers.
The modified-Booth algorithm (as described, for example, in A. D. Booth, xe2x80x9cA Signed Binary Multiplication Technique,xe2x80x9d Quart. J. Mech. Appl. Math, vol. 4, pt. 2, pp. 236-240, 1951; and in O. L. MarcSorley, xe2x80x9cHigh-Speed Arithmetic in Binary Computers,xe2x80x9d IRE Proc, vol. 49, pp. 67-91, January 1961) is widely used to implement multiplication in DSP systems and other applications. Although this type of multiplier is not the fastest multiplier design, it does reduce the number of product terms to be added by half when compared to an array multiplier, and also allows a regular layout.
Modified Booth Algorithm
The modified Booth algorithm works essentially as follows: Given two numbers A and B, the algorithm analyzes the multiplier data A (taking three bits at a time) to determine whether to add zero, B, xe2x88x92B, 2B, or xe2x88x922B based on the entire three bits. Table I shows the operation to be realized according to the three bits being analyzed. Ri is the accumulated result up to the current iteration.
Row 1 and Row 8 of table 1 will be called NOOP (NO OPERATION) since from the algorithm perspective no addition is performed, only a division by 4 (i.e, a shift). For the radix-4 modified Booth algorithm (i.e., analyzing 3 bits at a time with 1 bit of overlap) it can be observed that in comparison with an array multiplier the number of rows is reduced by half. A carry save array is used to add the partial products and a fast adder is used to add the final two words (i.e., carry and sum) producing the final product.
From table 1 it can be observed that the implementation of the modified Booth algorithm requires a 5:1 mux in order to add B, xe2x88x92B, 2B, xe2x88x922B or zero to the partial product.
A significant improvement can be achieved to reduce the rows of the multiplier if a higher radix is used for the multiplier data (see, for example, H. Sam and A. Gupta, xe2x80x9cA Generalized Multibit Recoding of Two""s Complement Binary Numbers and Its Proof with Application in Multiplier Implementations,xe2x80x9d IEEE Transactions on Computers, vol. 39, pp. 1006-1015, 1990). The problem associated with this approach is that term 3B needs to be generated which is very difficult (i.e., time consuming). G. Bewick and M. J. Flynn (xe2x80x9cBinary Multiplication Using Partially Redundant Multiples,xe2x80x9d Stanford University Technical Report, no. CSL-TR-92-528, 1992) propose the use of small adders to generate this term in a partially redundant form. Still this approach adds overhead to the multiplier and breaks the regular structure of the multiplier.
A. Y Kwentus, H. Hung, and A. N. Willson, Jr. (xe2x80x9cAn Architecture for High Performance/Small Area Multipliers for Use in Digital Filtering Applications,xe2x80x9d IEEE Journal of Solid-State Circuits, vol. 29, pp. 117-121, 1994) present the architecture of a multiplier where the terms 0, B, 2B, 3B are used. The main advantage of this multiplier is the reduction of the multiplexer from 5:1 (modified-Booth) to 4:1. The main disadvantage is that the 3B term needs to be pre-computed and stored in memory or generated with a fast adder.
In each of these arrangements, the multiplicands B are replicated one time for each group of two or three multiplier bits (data A), and scaled by the appropriate factor shown above. These scaled mutiplicands must then be added together. In the present art this adding operation is done sequentially with each adder having two data inputs. As a result of this ripple addition, invalid intermediate states are produced in the adders which wastes power.
In accordance with the invention, a digital multiplier for multiplying multiplier data by multiplicand data to provide a product utilizes a plurality of sequentially powered adders to reduce the number of invalid intermediate states in the adders and thereby save power.