The present invention relates to a finite impulse response (FIR) filter. A standard FIR filter (direct-form) is shown in FIG. 1 and given by the equation
            y      k        =                  ∑                  n          =          0                          N          -          1                    ⁢                        h          n                ⁢                  r                      k            -            n                                ⁢        where h represents the filter coefficients, r represents the data to be filtered, and N is the length of the filter. In FIG. 1, the z−1 symbol represents a unit delay operator, the X symbol represents a multiplier function, and the + symbol represents an adder function. The equation for a matched FIR filter is
            y              k        +                  (                      N            -            1                    )                      =                  ∑                  n          =          0                          N          -          1                    ⁢                        s          n          *                ⁢                  r                      k            +            n                                ⁢        where the filter is the complex conjugate of the reference signal, s, and has been time reversed. Each data sample in the filter is multiplied by a tap coefficient and these products are summed to create a single output whenever a data sample is shifted in. The problem with implementing this architecture in hardware (i.e., FPGA or ASIC) is that multipliers are resource-intensive. The number of bits that must be stored is also doubled when a multiplication occurs. The number of bits additionally increases by log2(N) from the additions in the adder tree (where N is the number of taps in the filter). Thus the number of bits at the output is 2B+log2(N), where B is the number of bits input to the filter. For a complex multiplication, one realization requires three real FIR filters to perform the four multiplications (using sums and differences). That is, three of the filters in FIG. 1 would be needed in parallel.
A problem specific to the direct-form FIR implementation is the limitation due to timing caused by the adder tree. This adder tree is needed to sum all the products. A systolic FIR filter implementation, shown in FIG. 2, provides identical results to a direct-form FIR filter, but without the problems associated with the adder tree. Each tap is contained within a dashed-line box. A systolic FIR filter implementation can be characterized as a form of a computing pipeline of data processing cell elements connected in series, where the outputs of any one cell element are the inputs of the next one. The typical systolic FIR filter implementation, however, also contains multiplies, as shown in FIG. 2. Again, constructing multipliers in hardware is resource-intensive and the same bit-growth problems exist here. As with the direct-form filter, three systolic filters would be needed for a complex filter.