Finite impulse response (FIR) filters have proven especially useful in digital signal processing (DSP) applications. For example, it is known that FIR filters can be used to perform sampling rate changes in time-sampled systems, through decimation and interpolation.
Different forms for implementing FIR filters are known which may be adapted to software or hardware implementations. For example, in a so-called direct form FIR filter, input data samples are stored in a delay line, and the filter output is computed by forming a weighted sum of input data samples from different times. The weighting factors are the filter coefficients of the FIR filter, and are the filter impulse response. In contrast, in a so-called transpose form FIR filter, each input sample is simultaneously multiplied by the different coefficients, and then the different products are delayed relative to each other and summed to form the filter output.
In general, the direct and transpose form FIR implementations require approximately the same number of multipliers, delays, and adders. Specifically, one multiplier for each of the filter impulse response coefficients is usually required for either form. On one hand, this can be an advantage for high speed operation, in that the filter can operate at the maximum sample clock supported by the multiplier. Longer filters can be implemented by using additional multipliers. On the other hand, when the sample rate is lower, or decimation is performed, such high performance may not be needed. Multipliers can consume significant resources in hardware implementations, and it is therefore often desirable to minimize the number of multipliers used.
Considering for example a decimating FIR filter, decimation is performed by keeping only some of the output samples that would ordinarily be produced at the output of the filter. For example, decimation by two can halve the sample rate by taking only every other output sample from the filter and discarding the others. Since it is inefficient to compute output samples which are then discarded, only the desired outputs need be computed. This can be used to either reduce the number of multipliers or to reduce the speed at which the multipliers must operate. For example, one simple implementation of a decimating filter uses a single multiplier and accumulator. As each input sample arrives, it is multiplied by an appropriate coefficient and accumulated into a sum. The multiplier thus operates at the input sample rate. This implementation, however, suffers from the disadvantage that the length of the filter must be less than the decimation factor. Although this disadvantage can be avoided by placing input samples into a memory and allowing the multiplier to compute overlapping sums, this has the disadvantage of increasing the clock rate of the multiplier and requiring the additional complexity of the input sample memory and associated addressing.
An alternate implementation of a decimating filter is the so-called polyphase filter. In a polyphase filter, an input commutator distributes samples to one of several delay lines, where the number of delay lines is equal to the decimation factor. The filter output is computed as the sum of products, where the samples in each delay line are multiplied by a corresponding filter coefficient to produce products, all of which are summed together (the so-called direct form filter). Each of the delay lines and multipliers can run at a fraction of the input sample rate. The polyphase filter thus allows the multipliers to run at a lower clock rate, but suffers the disadvantage of requiring a number of multipliers equal to the filter length. Polyphase filters can suffer an additional disadvantage that their structure is tied to the decimation factor. Changing the decimation factor requires the input commutator and number of delay lines to also be changed. Hence, polyphase filters can be difficult to implement in hardware with programmable decimation factors, and changing the decimation factor during operation is difficult. Finally, polyphase filters can be inefficient when the decimation factor is high and the filter length is long. Many multipliers are required, due to the long filter length, but they only need to run infrequently, due to the high decimation factor. Hence, large amounts of resources are used very inefficiently. Circuitry can be added to permit sharing of multipliers for multiple input samples, but this requires additional complexity in the input sample multiplexing and coefficient memory addressing.
Finally, very high rate input sample rates can present a challenge to the design of a FIR filter. Even when the output rate of the FIR filter is low (because a high decimation factor is present), many input samples must be stored and processed by the filter. The input circuitry of the FIR filter must therefore run at the input sample rate. Although the processing speed of programmable gate arrays, custom hardware, and processors continues to improve each year, the sample rate requirements of some applications can press beyond the ability of the available components. For example, even though currently available field programmable gate arrays can operate at clock rates greater than 120 MHz, sample rates of many times this, for instance 480 MHz, can be difficult to accommodate.