The least mean square (LMS) adaptive filter algorithm has found many applications in situations where the statistics of the input processes are unknown or changing. These include noise cancellation, line enhancing, and adaptive array processing. The algorithm uses a transversal filter structure driven by a primary input, so as to minimize the mean-square error.
The LMS algorithm is often the algorithm of choice for hardware realization because it solves adaptive filter problems without prior knowledge of higher order statistics of the signal being processed. The LMS algorithm is derived in Chapter 6 of the book entitled "Adaptive Signal Processing," by B. Widrow and S. D. Stearns, Prentice-Hall, Inc. (1985). The prior art transversal filter implementation of the LMS algorithm is a time domain network in which time-spaced samples of a given input signal are weighted and summed to produce as an output signal an idealized replica of the input signal. These applications, however, have invariably used sequential processing or microprocessor control.
An n.sup.th order filter that implements the LMS algorithm can be represented by the following equations: ##EQU1## EQU e(n)=g(n)-y(n) EQU c.sub.k (n)=c.sub.k (n)+.mu.e(n)x(n-k) EQU k=0,1, . . . , N-1
where x(n) is the filter input at time n; c.sub.k (n) is the kth filter coefficient at time n; y(n) is the filter output; g(n) is the desired response; e(n) is the residual error, and .mu. is the step size of coefficient updating.
The LMS algorithm, especially for high-speed applications, has generally been implemented in a transversal manner, as shown in the 5-tap example of FIG. 1. The structure consists of the filter kernel and the error generation and feedback loop. The kernel, included within the dashed line identified by reference 10, consists of the transversal finite impulses response (FIR) portion and the coefficient updating sector. A more complete description of this prior art filter can be found in U.S. Pat. No. 5,450,339 to Chester et al. This prior art transversal LMS implementation has several limitations. For long filter applications, the summation tree formed with the summation block 38 presents latency problems in the filter. For example, the longer the filter (i.e., the higher order of the FIR filter), the more taps in the adaptive filter and the longer the latency becomes through the summation tree. This latency causes delays in the error calculations and eventually can lead to a situation wherein the error calculation occurs after the relevant data sample has exited the filter's state register. Additionally, the summation tree further limits the regularity and modularity of the design which can severely limit integrated circuit implementations by limiting the cascadability of the architecture.
The LMS algorithm can also be implemented by a reconfigurable datapath dedicated for LMS execution, or even by a programmable digital signal processor. The former provides higher efficiency than the latter, however, it is still much slower than those implementations using dedicated parallel hardware. Further, significant hardware overhead is introduced by data switching, storage, and control configurations. FIG. 2 shows a prior art implementation which uses a reconfigurable hardware architecture to execute the LMS algorithm and some other multiplication functions. A more complete description of this prior art filter can be found in U.S. Pat. No. 5,001,661 to Corleto et al.
To overcome the inherent drawbacks of the standard LMS algorithm, modifications to the algorithm have been proposed for the purposes of easier hardware implementation. FIG. 3 shows a prior art architecture which adopts the direct transversal finite impulse response portion of the algorithm while using the delayed version of the updated filter coefficients. The embodiment of FIG. 3 is taken from U.S. Pat. No. 4,726,036 to Sawyer et al. and is more completely described therein. Although the hardware critical path is shortened by the new architecture for the tapped delay line, one main disadvantage is that the summation tree 30 is still present.
Another modification of LMS, called delayed LMS (DLMS), has been recently introduced. The DLMS algorithm was first disclosed in a paper entitled "The LMS Algorithm with Delayed Coefficient Adaptation", by G. Long, G. Ling, and J. G. Proakis, IEEE Transactions of Acoustices, Speech, and Signal Processing, vol. 37, No. 9, September 1989. A hardware implementation of the DLMS algorithm is found in "Bit-Serial VLSI Implementation of Delayed LMS Adaptive FIR Filters", by Chin-Ling Wang, IEEE Transactions On Signal Processing, vol. 42, No. 8, August 1994. FIGS. 4, 5, and 6 show the data flow graph and two systolic architectures for implementing the delayed LMS algorithm, respectively. Note that in the implementations shown in FIGS. 5 and 6, the summation tree has been removed. Although the implementations shown in FIGS. 5 and 6 of the DLMS algorithm solve to a certain extent the problem of the critical path at hardware computation and providing the desired modularity for easy VLSI implementation, these architectures are not optimized.