The invention relates to the field of multiply-accumulate (MAC) circuits.
In a single MAC operation, two numbers are multiplied by a multiplier element and the result is stored in an accumulator register. The results of further multiplications are added to the number stored in the accumulator. In this way, two series of numbers can be pair-wise multiplied and a running sum of the results maintained. At the end, the accumulator contains the sum of all the multiplications.
The MAC operation is one of the fundamental operations of digital signal processing. For example, a finite impulse response (FIR) filter is implemented as a series of MAC operations. The filter has as its input a sequence of n data values (or taps), d0, d1, . . . , dn-1, and n filter coefficient values, c0, c1, . . . , cn-1, where n is an integer greater than or equal to one. The output of the filter is calculated as the sum of each data value multiplied with its corresponding coefficient, which is represented by the series below:
      ∑          i      =      0              n      -      1        ⁢            d      i        ×          c      i      
In general, a MAC circuit has a fixed number of multiplier elements which multiply multiplicands of a first word having n multiplicands by corresponding multiplicands of a second word having n multiplicands. In a MAC circuit with only one multiplier, each pair of multiplicands is multiplied sequentially and the result added to a running sum. If, on the other hand, there are as many multiplier elements as the number of multiplicand pairs, the multiplications can all be carried out in a single cycle, and the output of the multipliers added together in a single step. In typical implementations of MAC circuits, however, there are fewer multiplier elements than the number of multiplicand pairs. In such a case, more than one cycle of multiplications is required for calculating the final result of MAC operations. If the number of pairs is exactly divisible by the number of multiplier elements, the multipliers will be fully utilized on each cycle of the MAC circuit operation. If the number of multiplicand pairs is not exactly divisible by the number of multiplier elements, the multiplier elements will not be fully utilized on either the first or last cycle of multiplications.
Additionally, when processing more than one word of data, if n is not exactly divisible by the number of multiplier elements, the last round of multiplications will not fully utilize all the multiplier elements during transitions from one word to the next.