In the field of signal processing, especially digital signal processing, many of the necessary operations are of the form of a finite impulse response (FIR) filter, also known as a weighted average. In this well-known operation, a finite set of values, also called filter coefficients or tap weights, h(k), for k=0, . . . , N'11, and the values of an input data sequence, x(k), are used to create output sequence values, y(n), by the rule y(n)=Σk=0N−1h(k)x(n−k). Because each time n is incremented by 1, the selected set of input values is shifted by 1, this process is also called a sliding window sum. To calculate each y(n), pairs of coefficients and input values are first multiplied and then added to the sum, a process termed multiply-accumulate (MAC).
Other known types of calculation common in signal processing involve a correlation calculation, similar to FIR operations, but involving two data signals.
One example is the operation of autocorrelation, in which a signal x(m) is compared with a shifted version of itself, x(m+n), to create an autocorrelation signal by the formula X(n)=Σk=0N−1x(k)x(k+n). It is clear that such a correlation calculation also uses many MAC operations.
FIR and correlation operations are used extensively in signal processing to select data desired frequencies, remove noise, calculate a signal's power spectral density, among other applications. As the forms of the equations show, these operations are well-suited for implementation on computer hardware. To implement FIR filter operations, the filter coefficients are loaded into a dedicated memory array, then for each value y(n), the corresponding portion of the inputs are loaded into a second memory array, and the MAC operation is performed pairwise on the aligned values. To implement an autocorrelation, values of both signals are continually loaded into memory.
Though implementing FIR and correlation operations can be done on a general purpose computer process through software, and often is, many signal processing applications require very fast computations of the operations. These cases often require dedicated implementation on special purpose digital hardware, such as digital signal processors (DSP), on reconfigurable platforms such as field programmable gate arrays (FPGA), or on application specific integrated circuits (ASIC). At this level, the specific details of hardware implementation, such as how the values are represented and internally stored, their data type, data bus sizes, etc., become important for obtaining very high speed operations. One goal for efficient hardware implementation is to have a MAC operation occur on every cycle. Achieving even higher MAC rates is especially worthwhile.
A general method and system, known in the art, for achieving fast FIR operations is shown in FIG. 1. Data or coefficients are moved from the system's memory through an address generator (AG) and stored in the system's quickly accessible memory locations, called the register file (Reg File). On each cycle, two values are moved from the Reg File into the MAC unit and their product calculated and summed into the accumulated value and written back to the accumulation register location.
For normal ongoing operation there must be a balance of the amount of data being read into the Reg File as is consumed by the MAC unit. Further, data values going into the MAC unit must be complete; if there is a delay accessing a data value necessary for the MAC unit, then it must wait a cycle (or more) until it obtains a complete data value for the multiply and accumulate calculation. Such a pause is called a bubble cycle. It represents an inefficiency in the overall operation of the system. Preventing such inefficiency is one overall goal of the present invention. Another goal is to create an architecture in which more than one MAC operation can be performed in one cycle. Another goal is to handle address misalignments for performing correlation type calculations.