1. Technical Field
The present invention relates to a high speed multiplier and specifically to a high speed multiplier that utilizes cache memory searches for previous results. The multiplier would be well suited for various digital signal processing (DSP) applications,
2. Description of the Related Art
Besides addition, multiplication is a very heavily used core operation for signal processing. To achieve high throughput, fast multiplications are required. The multiplication of two unsigned numbers A and B creates the product P EQU P=A*B
where A is called the multiplicand and B the multiplier. Given that A is an m-bit positive whole number and B is an n-bit positive whole number, then the numeric representation of the product P requires (m+n) bits.
In digital signal processing (DSP) system, there is always a demand for fast multiplication. For example, an N-tap with M-bit per tap FIR filter 100, shown in FIG. 1, requires N.times.M multiplications. A multiplicand X.sub.n 102 is multiplied by a first coefficient C1 106, while X.sub.n-1 112, the output of the Z.sup.-1 operator, is multiplied by a second coefficient C2 108, and X.sub.n-2 114, the output from a second Z.sup.-1 operator, is multiplied by a third coefficient C3 110. The results of each multiplication are summed 116 to produce the output value Y.sub.n 104.
For a real-time DSP application, the Nyquist theory dictates that the sampling rate of a system (Fs) is twice the bandwidth of the system (Fs=2F). Thus, higher system bandwidth requires faster multiplication operations. There are many hardware implementations of parallel multipliers. However, the basic design for each multiplier is an add and shift algorithm. This algorithm generates a partial product, using Booth's algorithm for example, and then adding a partial product using a ROM look up table. For a very basic implementation of the multiplier is consisting of a fast adder, multiplexer (mux) and shift register. An example of a 4.times.4 multiplier is following: ##EQU1##
Two registers 202, 204 are used to hold the value of the multiplier and the multiplicand as shown in FIG. 2. The multiplier register 202 is shifted into the control logic. If multiplier bit n is a zero, the multiplexer (mux) 216 will select a zero output. Otherwise the mux will select the multiplicand output. The shift register will shift the mux output to n-1 bit to the left. The adder 212 will add this with the partial register 210 that has the initial value of zero. After N iterations the adder 212 will output the final product 214 From the above example, there are N iterations for an N.times.N multiple. Thus, for a 30-bit by 30-bit multiplication, there would be 30 iterations. Likewise, for a 60.times.60 multiplication, there would be 60 iterations. A need exists to perform these multiplications with fewer iterations.