Digital filters are commonly employed in signal processing applications. FIG. 1 shows a finite impulse response (FIR) filter in a well-known direct form. As shown in FIG. 1, filter 100 comprises multipliers 103a through 103e having five taps with filter weights or tap coefficients, w.sub.0 through w.sub.4, respectively. These filter weights represent multiplicands to be multiplied by input data traversing input path 101. In accordance with the direct form, delay elements 105a through 105d, which may be shift registers, are inserted on input path 101 and each disposed between two multipliers. In addition, adders 107a through 107d are disposed on output path 111 and each connected at the output of a multiplier. With such an arrangement, the z-transform of the output of filter 100, Y.sub.100 (z), is: EQU Y.sub.100 (z)=w.sub.0 +w.sub.1 z.sup.-1 +w.sub.2 z.sup.-2 +w.sub.3 z.sup.-3 +w.sub.4 z.sup.-4. (1)
In high-speed signal processing applications, the direct form filter is not desirable in that its critical path, corresponding to the maximum computation delay in generating an output, includes many computational elements contributing to the delay. For example, the critical path of filter 100 includes five computational elements, namely, multiplier 103e, and adders 107a-d on output path 111. Furthermore, this computation delay increases with the number of taps in the direct form filter.
However, use of digital filters in a transpose form overcomes the above computation delay problem. FIG. 2 shows FIR filter 200 in the transpose form. The z-transform of the output of filter 200, Y.sub.200 (z), is: EQU Y.sub.200 (z)=w.sub.0 +w.sub.1 z.sup.-1 +w.sub.2 z.sup.-2 +w.sub.3 z.sup.-3 +w.sub.4 z.sup.-4. (2)
By comparing expression (2) with expression (1), one realizes that filter 200 has the same transfer function as filter 100. However, unlike filter 100, no delay element is disposed on input path 201 in filter 200. Rather, in accordance with the transpose form, delay elements 205a through 205d are disposed on output path 211 and each inserted between multiplier/adder pairs. This being so, the critical path in filter 200 includes a multiplier and an adder, resulting in the maximum computation delay incurred by a multiplication and an addition. Furthermore, such computation delay does not depend on the length, or the number of taps, of filter 200.
Nonetheless, one of the drawbacks of a transpose form filter is that the multipliers in the filter present a substantial capacitive load at the filter input, resulting in a significant input delay and a substantial level of power consumption. Power consumption becomes a major issue when it affects the choice of packaging for the filters, and the packaging becomes expensive if it is required to dissipate heat efficiently. Furthermore, the capacitive load increases with the number of filter taps, thus requiring use of buffers to provide an amount of charge proportional to the number of taps.
Another drawback of a transpose form filter is that because the delay elements are disposed on the output path of the filter, these delay elements, typically shift registers, are relatively large, with respect to those in a direct form filter, to accommodate the relatively long bit strings representing sums of products on the output path. Such large delay elements are relatively expensive, and contribute more power consumption in the filter.
Another type of FIR filter employs the well-known systolic architecture. Representative W1 and W2 systolic FIR filters are shown in FIGS. 3 and 4, respectively. Among other things, systolic filters are desirable in that they are arranged in a pipeline (or modular) form and comprise a number of structurally identical modules. Each module in the respective filter is shown in FIGS. 3 and 4 by a dashed box enclosing the module. Since the modules are independent of one another, the layouts of the W1 and W2 systolic filters simply involve an assembly of identical predefined modules.
Like the transpose form filters, the computation delay of the systolic filters is independent of the number of filter taps. However, additional delay elements have been inserted in the systolic filters to reduce both the computation delay and input capacitive load. The undesirable effect occasioned by these additional delay elements is apparent from examining the z-transforms of the respective systolic filter outputs. The z-transform of the W1 systolic filter output, Y.sub.W1, is: EQU Y.sub.w1 (z)=z.sup.-1 (w.sub.0 +w.sub.1 z.sup.-2 +w.sub.2 z.sup.-4 +w.sub.3 z.sup.-6 +w.sub.4 z.sup.-8). (3)
From expression (3), the factor z.sup.-1 indicates that the latency of the W1 systolic filter output equals a clock cycle. That is, it takes a clock cycle after the data is input to the filter to obtain the corresponding filter output. Although the latency of a clock cycle may be tolerable, the remaining expression, w.sub.0 +w.sub.1 z.sup.-2 +w.sub.2 z.sup.-4 +w.sub.3 z.sup.-6 +w.sub.4 z.sup.-8, which is a function of z.sup.-2, presents a more challenging problem in a high-speed signal processing application. In order to maintain the input data bit rate, the clock rate at which the filter operates must be double the input rate. This is challenging because the input rate is already very high in the high-speed application.
Turning to the W2 systolic filter of FIG. 4, the z-transform of the filter output, Y.sub.W2, is: EQU Y.sub.W2 (z)=z.sup.-5 (w.sub.0 +w.sub.1 z.sup.-1 +w.sub.2 z.sup.-2 +w.sub.3 z.sup.-3 +w.sub.4 z.sup.-4). (4)
From expression (4), the factor z.sup.-5 indicates that the latency of the filter output equals five clock cycles. In general, the latency of a W2 systolic filter output equals N clock cycles, where N is the number of filter taps. In many signal processing applications, such large latency is simply unacceptable.
Accordingly, there exists a need for a digital filter design characterized by a short computation delay and latency, low power consumption, and an inexpensive and uncomplicated construction.