The modified-Booth algorithm (as described, for example, in A. D. Booth, "A Signed Binary Multiplication Technique," Quart. J. Mech. AppL Math, vol. 4, pt. 2, pp. 236-240, 1951; and in O. L. MarcSorley, "High-Speed Arithmetic in Binary Computers," IRE Proc, vol. 49, pp. 67-91, Jan. 1961) is widely used to implement multiplication in DSP systems and other applications. Although this type of multiplier is not the fastest multiplier design, it does reduce the number of product terms to be added by half when compared to an array multiplier, and also allows a regular layout.
Modified Booth Algorithm
The modified Booth algorithm works essentially as follows: Given two numbers A and B, the algorithm analyzes the multiplier data A (taking three bits at a time) to determine whether to add zero, B,-B, 2B, or -2B based on the entire three bits. Table I shows the operation to be realized according to the three bits being analyzed. Ri is the accumulated result up to the current iteration.
TABLE 1 Modified Booth Algorithm A.sub.2i + 1 A.sub.2i A.sub.2i - 1 Operation 0 0 0 R.sub.i = R.sub.i - 1 /4 0 0 1 R.sub.i = (R.sub.i - 1 + B)/4 0 1 0 R.sub.i = (R.sub.i - 1 + B)/4 0 1 1 R.sub.i = (R.sub.i - 1 + 2B)/4 1 0 0 R.sub.i = (R.sub.i - 1 + 2B)/4 1 0 1 R.sub.i = (R.sub.i - 1 - B)/4 1 1 0 R.sub.i = (R.sub.i - 1 - B)/4 1 1 1 R.sub.i = R.sub.i - R.sub.i - 1 /4
Row 1 and Row 8 of table 1 will be called NOOP (NO OPERATION) since from the algorithm perspective no addition is performed, only a division by 4 (i.e, a shift). For the radix-4 modified Booth algorithm (i.e., analyzing 3 bits at a time with 1 bit of overlap) it can be observed that in comparison with an array multiplier the number of rows is reduced by half. A carry save array is used to add the partial products and a fast adder is used to add the final two words (i.e., carry and sum) producing the final product.
From table 1 it can be observed that the implementation of the modified Booth algorithm requires a 5:1 mux in order to add B, -B, 2B, -2B or zero to the partial product.
A significant improvement can be achieved to reduce the rows of the multiplier if a higher radix is used for the multiplier data (see, for example, H. Sam and A. Gupta, "A Generalized Multibit Recoding of Two's Complement Binary Numbers and Its Proof with Application in Multiplier Implementations," IEEE Transactions on Computers, vol. 39, pp. 1006-1015, 1990). The problem associated with this approach is that term 3B needs to be generated which is very difficult (i.e., time consuming). G. Bewick and M. J. Flynn ("Binary Multiplication Using Partially Redundant Multiples," Stanford University Technical Report, no. CSL-TR-92-528, 1992) propose the use of small adders to generate this term in a partially redundant form. Still this approach adds overhead to the multiplier and breaks the regular structure of the multiplier.
A. Y Kwentus, H. Hung, and A. N. Willson, Jr. ("An Architecture for High Performance/Small Area Multipliers for Use in Digital Filtering Applications," IEEE Journal of Solid-State Circuits, vol. 29, pp. 117-121, 1994) present the architecture of a multiplier where the terms 0, B, 2B, 3B are used. The main advantage of this multiplier is the reduction of the multiplexer from 5:1 (modified-Booth) to 4:1. The main disadvantage is that the 3B term needs to be pre-computed and stored in memory or generated with a fast adder.
TABLE 2 Kwentus Encoding A.sub.2i + 1 A.sub.2i Operation 0 0 R.sub.i = (R.sub.i - 1)/4 0 1 R.sub.i = (R.sub.i - 1 + B)/4 1 0 R.sub.i = (R.sub.i - 1 + 2B)/4 1 1 R.sub.i = (R.sub.i - 1 + 3B)/4