Adaptive filter methods or algorithms are used extensively in many signal processing applications. For example, adaptive filtering is used for echo cancellation in communication devices.
An echo occurs when a party can hear his own voice or his own background noise through his communication device. In a telephone conversation, echo is heard when the party's sound signal travels through a speaker or speakerphone of the listener's telephone, and then travels back through the microphone of the listener's telephone. Echo is more prominent when one party is using his speakerphone.
Echo cancellation is a process of reducing or removing echo signals from communications, such as a conversation over a telephone. Echo cancellation first involves recognizing an echo signal. Then, once the echo signal is recognized, the echo can be removed by subtracting, filtering, or cancelling it.
More precisely, a linear adaptive filter in an echo canceller is typically used to model acoustic coupling between a speaker and a microphone. This acoustic coupling, or the path is often referred to as the true room response h[n]. The linear adaptive filter ĥ[n], which models the true room response is used to generate a replica of the echo, ŷ[n], which is subtracted from the echo corrupted microphone signal, m[n], to get an echo free signal e[n].
The process of echo cancellation only occurs when there is far end activity. In other words, echo cancellation is only performed at the listener's end when the speaker's end is active (e.g., the speaker is talking).
It is more difficult to remove an echo when both parties are talking simultaneously, also known as “double talk”. This difficulty is referred to as the “double talk problem.” On a near end microphone, there is near end speech, near end background noise, and a far end echo signal. The double talk problem is the difficulty in identifying and distinguishing the far end echo from the near end speech and near end background noise during double talk.
A conventional solution is to discretely, non-continuously, or non-dynamically halt the adaptation of a speech filter during double talk. This has been accomplished via a double talk detector. A double talk detector stops or halts the acoustic echo cancellation filter's adaptation during periods of simultaneous speech from both communication devices.
Conventional echo cancellers use a normalized least mean square (NLMS) based adaptive filter(s) to model the acoustic coupling between the loudspeaker and the microphone (i.e., model the true room response). This algorithm is very popular because of its robustness and simplicity. The stability and adaptation speed of this filter is governed by the step size parameter.
The larger the step size, the more rapidly the filter converges to the true room response but with a poor steady state misalignment or poor stability. On the other hand, a smaller step size gives lower steady state misalignment but at the cost of increased convergence time. Thus, the choice of step size parameter reflects a trade off between faster convergence on one hand and poor steady state misalignment on the other. A fixed step size based adaptive algorithm typically uses a small step size depending on the application, for steady state accuracy giving up the advantages of quicker convergence.
To address the above mentioned conflicting requirements, variable step size based adaptive algorithms have been used. A variable step size adaptive filter can use different step sizes at different instances of time. Recently, J. Benesty proposed a nonparametric variable step size (VSS) normalized least mean square (NLMS) based adaptive algorithm (J. Benesty, H. Rey, L. R. Vega, and S. Tressens, “A nonparametric VSS NLMS algorithm,” IEEE Signal Processing Letters, Vol. 13, pp. 581-584, October 2006). However, Benesty's approach does not address the double talk problem. A key parameter in most VSS algorithms is the estimate of the energy of the near-end signal. Often minimum statistics based methods are used, but these only estimate the energy of the background noise of the near-end signal, not the energy of the total signal—the background and the near-end talker.