Acoustic echo cancellation removes the echo captured by a microphone when a sound is simultaneously played through speakers located near the microphone. Many high noise environments such as noisy conference rooms or lobbies and hands-free telephony in cars require effective echo cancellation for enhanced communication. However, the presence of noise impedes the convergence of acoustic echo cancellation algorithms, which leads to poor echo cancellation.
In echo cancellation, complex algorithmic procedures are used to compute speech echo models. This involves generating the sum from reflected echoes of the original speech and then subtracting this from any signal the microphone picks up. The result is the purified speech of the person talking. The format of this echo prediction must be learned by an echo canceller in a process known as adaptation. The parameters learned from the adaptation process generate the prediction of the echo signal, which then forms an acoustic picture of the room in which the microphone is located.
The performance of an adaptive filtering algorithm can be evaluated based on its convergence rate and a factor known as misadjustment. The rate of convergence can be defined as the number of iterations required for the algorithm, under stationary conditions, to converge “close enough” to an optimum Wiener solution in the mean-square sense. Misadjustment describes the steady-state behavior of the algorithm, and is a quantitative measure of the amount by which the averaged final value of the mean-squared error exceeds the minimum mean-squared error produced by an optimal Wiener filter. A well known property of adaptive filtering algorithms is the trade-off between adaptation time and misadjustment. An effective acoustic echo canceller requires fast adaptation when the echo path changes and smooth adaptation when the echo path is stationary.
In many acoustic echo cancellation algorithms, an adaptive filter learns the transfer function of the near-end room, the part of the room nearest the microphone, using a normalized, least mean square (NLMS) algorithm. The NLMS algorithm is the most widely used algorithm in acoustic echo cancellation and it provides a low cost way to determine the optimum adaptive filter coefficients. The algorithm minimizes the mean square of the residual echo error signal at each adaptation step (e.g., at each sample), hence the name of the algorithm. Normalization by signal power is typically used because speech is a highly non-stationary process. NLMS updates the adaptive filter coefficients depending upon the error signal from the unprocessed microphone signal and the echo predicted by the current adaptive filter. In high noise environments, this error is increased by the uncorrelated noise which causes the adaptive filter coefficients to move away from the optimal solution.
Previous works in acoustic echo cancellation in high noise focused on combined noise and echo reduction. One of the approaches is to preprocess the microphone signal through a noise suppression algorithm and perform adaptation using the far-end speaker signal that has undergone the same noise suppression operations as the microphone signal. Although this seems favorable, experiments revealed that this technique often distorts the echo signal, which hinders the convergence properties of the acoustic echo cancellation algorithm. Furthermore, this technique requires perfect synchronization between the microphone and the far-end speaker signals, which is often difficult to attain.
Various post processing techniques used to remove echoes also result in noticeable distortion of the near-end speech captured by the microphone.