This invention is related to digital multipliers. More particularly, this invention is related to a digital multiplier which performs multiplication of complex numbers using a multi-bit recoding architecture.
1. Multi-bit Recoding Multiplication
A widely used multiplier architecture uses multibit recoding of two""s complement binary numbers to reduce the number of iterations required when performing a multiplication as a series of additions. A conventional generalized multibit recoding circuit 10 is illustrated in FIG. 1a. Conventional multibit recoding techniques are described in detail in H. Sam and A. Gupta, xe2x80x9cA Generalized Multibit Recoding of Two""s Complement Binary Numbers and Its Proof with Application in Multiplier Implementationsxe2x80x9d, IEEE Trans. Comp., V. 39 #8, p. 1006, August 1990, the contents of which are hereby incorporated by reference.
In brief, to determine the product Z of a multiplier X and a multiplicand Y in this architecture, the multiplier X is provided to a recoder 12 which examines the multiplier k+1 bits at a time and generates a corresponding sequence of signed digit values. These recoded values represent the number of times the multiplicand will be added or subtracted after shifting to contribute to the final product. The signed digits have values between 0 and +/xe2x88x922kxe2x88x921, where 2k is the recoding radix. In conjunction with recoding the multiplier X, the multiplicand Y is input to a multiples generator 14 which produces output signals that represent multiples of the multiplicand Y from Y to 2(kxe2x88x921)Y.
To perform the multiplication, each of the signed digits is used to select a particular multiple of Y, which is then shifted as necessary according to the xe2x80x9cplacexe2x80x9d of the multiplier X that the signed digit represents in radix 2k and then summed. The summation, which is performed by a partial product summer 16, can be in series or in parallel. Preferably, in the case of parallel summation, the shifting is hardwired.
In one implementation of the algorithm, a two""s complement binary number X is recoded into a signed digit representation as follows. First, the sign bit of X is extended as many positions as necessary so that the total number of bits n in X is divisible by k. Then a 0 is appended to the right of the least-significant-bit of X, i.e., to the right of bit position 0. This appended bit is designated position xe2x88x921. Next, vectors of k+1 bits are formed starting from bit xxe2x88x921 such that adjacent vectors share one bit. Each bit vector is converted into a signed digit according to the inner product of the bit vector with the vector K=[xe2x88x922kxe2x88x9212kxe2x88x922 . . . 2120 1]. Generally, the possible inner products are predetermined and hardcoded in recoder 12, i.e., as a look-up table or hardwired logic, using techniques known to those skilled in the art. In the particular implementation of a radix 16 multi-bit recoding multiplier, five bits of the multiplier X are examined at once and recoded to control whether multiples of from one to eight of the multiplicand get added or subtracted to contribute to the final result after shifting by the appropriate multiple of k bits. A table of the signed digit values for a 5-Bit Recoding (k=4) follows:
As also known to those of skill in the art, the multiples of the multiplicand can be determined using various techniques, such as combinations of shifts and adds. Only multiples of three, five, and seven times the multiplicand require any effort to determine since the multiples of two, four, and eight can be formed by shifts of the multiplicand, and the multiple of six can be formed by a shift of the multiple of three. In one implementation, the three, five, and seven multiples can be generated using a sequence of adders, i.e., 3Y=Y+2Y, 5Y=Y+4Y, and 7Y =8Yxe2x88x92Y.
In a radix 16 (i.e., k=4) multiplier circuit, a sixteen bit two""s complement multiplier X gets recoded into four signed digits SD1-SD4, each of which indicates the addition or subtraction of zero or one of the eight multiples of the multiplicand. A block diagram of partial product adder 16 configured for a 16-bit multiplier and multiplicand using radix 16 encoding is illustrated in FIG. 1b. The various multiples of Y output by the multiples generator 14 are input to four multiplexers 18.1-18.4. Each signed digit SD1-SD4 controls a respective multiplexer to select the appropriate multiple of Y (i.e., 0 to 8Y). First and second adders 20, 22 each receive inputs from a respective pair of multiplexers, as shown. Adders 20, 22 also receive sign control signals in accordance with the sign of the signed digits which indicate whether the inputs are added to or subtracted from the partial product. The outputs of adders 20 and 22 are themselves combined by an adder 24 which outputs the final product Z. As indicated above, the outputs of each of the multiplexers must be shifted in accordance with the xe2x80x9cplacexe2x80x9d that signed digit represents in base 16. This shift can be accomplished by shifting the bit positions in the hardwired connection between the output of the multiplexers and the input of the adders.
2. Complex Number Multiplication
Complex number multiplication is an increasingly common operation in Digital Signal Processing (DSP). To multiply two complex numbers, represented as X1+iY1 and X2+iY2, where xe2x80x9cixe2x80x9d represents the square root of xe2x88x921, the programmer typically breaks the computation up into
(X1+i Y1)(X2+i Y2)=(X1X2xe2x88x92Y1Y2)+i(X1Y2+Y1X2)xe2x80x83xe2x80x83Equ. 1
In a conventional DSP which has a single fixed point multiplier available, the four multiplications are performed sequentially and sums and differences are formed. For a typical programmable DSP, an addition or subtraction can be performed in parallel to the multiplication, with each of the multiplications or additions taking a cycle. More recently, programmable DSP integrated circuits have appeared that contain two or more multipliers operating in parallel. The multipliers are typically general purpose devices and each is a replica of the other. In a conventional multi-multiplier DSP, the same computation as when a single multiplier is available is performed, only it takes less time because more than one multiplication may be performed in parallel.
Attempts have been made to optimize specific Digital Signal Processing algorithms which perform complex number multiplication in hardware. In one configuration of such an Application Specific Integrated Circuit (ASIC) designed to implement Equation 1, four parallel multipliers are provided, similar to a conventional multi-multiplier DSP. The multipliers, which are copies of each other, calculate the four cross-products (X1X2, Y1Y2, X1Y2, X2Y1) in parallel. Two adders are provided, each connected to receive the outputs of a respective pair of multipliers. The adders combine the appropriate products to generate the real and imaginary components of the output. However, since each multiplier is replicated in its entirety, such a system requires a relatively large area of space in an integrated circuit to implement.
In another attempt, the complex multiplication performed is xe2x80x9cdecomposedxe2x80x9d to reduce the number of required multiplications from four to three according to the alternate decomposition:
(X1+iY1)(X2+iY2)=((X1xe2x88x92Y1)Y2+X1(X2xe2x88x92Y2))+i((X1xe2x88x92Y1)Y2+Y1(X2+Y2))xe2x80x83xe2x80x83Equ. 2
The term (X1xe2x88x92Y1)Y2 appears twice on the right hand side, hence only three multiplications are required. However, the price paid for this configuration is the need for more adders. Further, some of the adders have to perform before the multiplication and some after, resulting in a longer latency for the entire computation. Hence, this technique has only proved practical on architectures where time for multiplication is much longer than the time for addition.
Accordingly, it would be advantageous to provide a multiplication circuit architecture which efficiently performs complex multiplication but requires less area in an integrated circuit than conventional architectures.
Due to improvements in fabrication technology, various new applications are becoming important aspects of Digital Signal Processing. Many of these application make use of complex multiplications, often of large arrays of numbers. An important application is adaptive beamforming for arrays of antennas in a base station for wireless telephones which adding together the signals received by multiple antennas with time varying complex weights in order to form a radiation pattern pointing toward the mobile phone of interest, and putting nulls on sources of interference. Various solutions for optimizing this type of problem are known and involve performing complex number multiplication on matrices and vectors of data. Thus, instead of only a single complex multiplication being called for at a given time, there is the need for massive numbers of complex multiplications. In some cases, the multiplicands are not entirely independent, but may be reused many times with different values of multiplier or the reverse, where the multipliers are reused with different values of multiplicand.
The inventor has recognized that many common complex multiplications which are routinely performed in parallel using conventional DSP architectures have a high degree of redundancy. This redundancy can be exploited to minimize the hardware needed to perform the computation. Unlike conventional systems in which a multiplier is replicated several times to provide for parallel computations, a complex number multiplier according to the invention extends the architecture of a conventional multi-bit recoding multiplier to take advantage of this redundancy. In particular, a complex number multiplier according to the invention determines the recoding or multiples for each unique factor in the multiplication, as opposed to determining these values for each multiplication which occurs in evaluating a given equation.
More specifically, in a parallel multiplication of complex numbers, two multi-bit recoders are provided to recode the real and imaginary components XnR and XnI of each unique complex multiplier Xn. Similarly, for each unique complex multiplicand, Ym, two multiple generator subcircuits are provided to generate multiples of the real part YmR and imaginary part YmI of the multiplicand, respectively.
A partial product summer is provided for each cross product which is to be evaluated and is driven by the appropriate recoded multiplier digits and multiplicand multiples for the unique factors. According to one aspect of the invention, at least one of the unique factors appears in more than one cross product. Because there is only one recoder or multiples generator for that factor, less area is required to implement the circuit than in conventional architectures which replicate the entire multiplier circuit structure for each parallel multiplication. The outputs of the partial product summers are combined using appropriately configured adders to produce the desired complex number(s) solution.
For example, a conventional DSP may utilize 4 parallel multipliers to evaluate in parallel the four cross-products X1X2, Y1Y2, X1Y2, X2Y1, of Equation 1, above. Such a DSP requires four separate multiplier circuits and thus a relatively large area. Even if a multi-bit recoding multiplier of the type shown in FIG. 1a is used, the circuit resulting parallel multiplier circuit will still require four separate recoders 12 and four separate multiples generators 14 (i.e., one pair of each for each multiplier). In contrast, a circuit according to the invention configured to evaluate Equation 1, utilizes only two recoders, for multipliers X1 and X2, and only two multiples generators, for multiplicands Y1 and Y2, resulting in a substantial area and complexity savings. Because fewer recoders and multiples generators are needed in the circuit of the invention, more area can be dedicated to the recoders and multiples generators, allowing a higher radix to be used and a concurrently simpler and smaller adder tree in the partial product subcircuit. The area saved when compared to conventional increases for more extensive calculations, such as a circuit configured to evaluate two complex multiplications which require 16 cross products to be evaluated.