In the following, commonly used techniques for AEC, AES, and NS are described.
Acoustic Echo Canceler (AEC)
Traditionally, echo cancellation is accomplished by adaptively identifying the echo path impulse response and subtracting an estimate of the echo signal from the microphone signal. The far-end talker signal x(n) (loudspeaker signal) goes through the echo path, whose impulse response is modeled as an FIR filter, and adds to the microphone signal y(n) together with the near-end talker signal v(n) and the ambient noise w(n):y(n)=hTx(n)+v(n)+w(n),  (1)wherex(n)=[x(n), x(n−1), . . . x(n−J+1)]T,h=[h0, h1, hJ-1]T,
J is the length of the echo path impulse response, and T denotes the transpose of a vector or a matrix. To cancel the echo in the microphone signal, an echo estimate ŷ(n) is needed, which is generated by passing the far-end talker signal through an FIR filterĥ=[ĥ0, ĥ1, . . . ĥK-1]T  (2)of length K (generally less than J),ŷ(n)=[ĥT,0]x(n)  (3)
The FIR filter coefficients are estimated adaptively in time. Subtracting ŷ(n) from the microphone signal y(n) yields the error signale(n)=y(n)−ŷ(n)  (4)
The mean square error (MSE) can be expressed asE{e2(n)}=E{(y(n)−ŷ(n))2}  (5)where E{•} denotes mathematical expectation. The objective of the AEC is to estimate an ĥ that minimizes E {e2(n)}.
There is a vast literature addressing how to search for the optimum ĥ using adaptive techniques. Commonly used algorithms include normalized least-mean-square (NLMS), recursive least-squares (RLS), proportionate NLMS (PNLMS), affine projection algorithm (APA), etc.
During doubletalk (when the talker at the near-end is talking, i.e. v(n)≠0, usually the adaptive filter coefficients are “frozen” to prevent that the near-end signal v(n) has a negative effect on the adaptive filter in terms of estimating the acoustic echo path. For this purpose, a doubletalk detector is used.
Another solution was described to suppress the echo in an audio signal in the document US 2004/0057574. This is achieved by computing the spectral envelopes of the loudspeaker and microphone signals and determining the spectral envelope of the echo signal using adaptive filters. This approach attempts to continuously estimate the time varying spectral envelope of the echo signal. The problem with this technique is that the adaptive filters predicting the spectral envelopes need to re-converge every time when the properties of the loudspeaker signal change. That is, because the echo signal spectral envelope does not only depend on the loudspeaker signal spectral envelope but also on the loudspeaker signal itself. This signal dependence of the adaptive filters results in the problem that often the echo signal spectral envelope is not estimated precisely enough and the echo is not sufficiently removed.