A device for bi-directional audio-based communication typically may include both a loudspeaker and a microphone. The loudspeaker may be used to play back audio signals received from a remote (“far-end”) source, while the microphone may be used to capture audio signals from a local (“near-end”) source. In the case of a telephone call, for example, the near- and far-end sources may be people engaged in a conversation, and the audio signals may contain speech. An acoustic echo occurs when the far-end playback signal emitted by the loudspeaker is captured by the microphone, after undergoing reflections in the local environment.
An acoustic echo canceller (AEC) may be used to remove acoustic echo from an audio signal captured by a microphone, in order to facilitate improved communication. The AEC typically filters the microphone signal by determining an estimate of the acoustic echo, and subtracting the estimate from the microphone signal to produce an approximation of the true near-end signal. The estimate is obtained by applying a transformation to the far-end playback signal emitted from the loudspeaker. The transformation may implemented using an adaptive algorithm such as least mean squares, normalized least mean squares, or their variants, which are known to persons of ordinary skill in the art.
The adaptive transformation relies on a feedback loop, which continuously adjusts a set of coefficients that are used to calculate the estimated echo from the playback signal. Different environments produce different acoustic echoes from the same playback signal, and any change in the local environment may change the way that echoes are produced. By using a feedback loop to continuously adjust the coefficients, an AEC can adapt its echo estimates to the local environment in which it operates.
The feedback-based adaptation scheme works better in some situations than in others, so it may be beneficial to increase or decrease the rate of adaptation in different situations. The rate of adaptation may be controlled by adjusting a parameter referred to as “step size.” A larger step size will increase the rate of adaptation, and a smaller step size will decrease it.
Many communication devices also include a noise reduction (“NR”) module. Noise spectrum estimation is an important component of speech enhancement or recognition systems. If the noise estimate is too low, audible residual noise may cause annoyance, whereas if the noise estimate is too high, distortion of speech may result in intelligibility loss.
Some noise-reduction systems estimate and update the noise spectrum during silent segments of the signal (e.g., pauses) using a voice-activity-detection (VAD) algorithm. Such systems may be used in environments with stationary noise that is unchanging (e.g., white noise). However, such systems have trouble performing noise reduction in more realistic environments with noise that may be constantly changing. Such systems that rely on VAD algorithms are unable to estimate and update the noise spectrum during non-silent segments of the signal during which noise in an environment may be changing.
Other noise-reduction systems estimate and update the noise spectrum using noisy signal statistics. However, such systems fail to take into account information gained from acoustic echo cancellation.
Although these problems have been framed in reference to an audio-based communication system, the same problems may be encountered in any field in which echo cancellation and noise estimation is performed. The disclosure described herein is equally applicable to any such fields.