In digital signal processing, one of the most frequent operations which must be performed is the multiplication of two digital numbers. Often, hundreds or thousands of multiplications must be performed to execute a complex operation, such as computing a transform.
At its most basic level, binary multiplication is performed by multiplying each digit of the multiplicand with each individual digit of the multiplier, each such multiplication forming a partial product. If the multiplier bit is 0, the partial product is 0. If the multiplier bit is 1, the partial product is the multiplicand itself. In multiplication by a single bit, no carries are generated. Starting with the least significant bit, successive partial products are shifted one position to the left. The product is then the sum of the partial products. In the general case, the product can have a number of bits one greater than the sum of the multiplicand and multiplier bits, due to the generation of a carry bit.
Much attention has therefore been focussed on the design of high speed multipliers. This attention is directed at developing improved algorithms for multiplying digital numbers--i.e., more efficient architectures, requiring fewer operational steps--and at developing faster hardware, such as the constituent adders. In terms of architecture, two major portions are involved in a multiplier, the inner (i.e., partial) product generator and the partial product reduction mechanism. Algorithms for generating partial products include the straightforward AND array, the Pezaris array, the Booth algorithm and the modified Booth algorithm. The product reduction portion of the multiplier uses an array of adders to form the final product output, with the structure of the array being dependent on the algorithm chosen for partial product generation.
Most digital multipliers are based on the Booth algorithm (described, for example, in L. R. Rabiner and B. Gold, Theory and Application of Digital Signal Processing, Prentice-Hall, Inc., 1975, at 517-518, which is hereby incorporated by reference), which has played a major role in the implementation of fast multipliers. The basic idea of Booth's algorithm is to skip over individual iterations on an iterative shift-and add implementation of multiplication. The algorithm skips over 0 bits in the multiplier, which is a fairly obvious optimization, but it also skips over sequences of consecutive bits which are all 1's. The idea is that a sequence of N 1's in the multiplier is numerically equal to 2.sup.N -1, so the effect of multiplying by this sequence is the same as a subtraction in the least significant position, followed by an addition N positions to the left. This reduces multiplication to a single addition and subtraction for each consecutive string of 1's in the multiplier.
A typical array multiplier based on the Booth algorithm is shown in block diagram form in FIG. 1. The multiplicand is supplied to the X register 1, and the multiplier to the Y register 2. The Y register feeds a Booth decoder 3, which controls the operation of an array of full adders 4. The final product is formed by a collection of adders operating on the partial products which appear at the right hand side and bottom of the array.
A typical second order Booth encoder is shown in FIG. 2. It receives three consecutive bits from the Y-register, and provides three output signals: X1, X2 and S. The S signal indicates whether to use the appropriate one of the X1 and X2 signals, or their complements. Use of these signals will be discussed below, in connection with the present invention.
The array 4 is, as aforesaid, generally a two-dimensional array of one bit adders. An exemplary array is shown in FIG. 3. As this is not a clocked logic system, a certain amount of time must be allowed between the application of the input signals (i.e., multiplier and multiplicand) and the availability of the output (i.e., product). This time is a result of the fact that partial sums and carries take finite time to be generated and to propagate from level to level in the array. Though the constituent building blocks may be small and fast, there are usually a large number of them. Thus, array performance is critically dependent on the speed of the adders.
Moreover, the die area of each adder (and thus its parts count), as well as its power consumption, also is multiplied by the number of unit cells, in arriving at the overall array multiplier. Even small changes in adder design may therefore produce large changes in multiplier performance. Minimization of power consumption and die area are ever-present goals of the integrated circuit "chip" designer. Faster hardware, though, often requires increased power consumption and faster algorithms often require more die area, so that more operations can be executed in parallel. Thus, these constraints can work at cross purposes.
According to the state of the art, the simplest full CMOS Booth multiplier cell requires about forty transistors and nine large conductive traces, which consume precious die area.
It is thus an object of the present invention to provide an improved digital multiplier.
Another object of the present invention is to provide an improved CMOS full adder and selector cell for an array multiplier.
It is another object of the invention to provide a full adder (and selector) cell which requires fewer than forty transistors.
A still further object of the invention is to provide a full adder cell which when assembled into an array requires fewer selector circuits and thereby consumes small die area.