A device for bi-directional audio-based communication typically may include both a loudspeaker and a microphone. The loudspeaker is used to play back audio signals received from a remote (“far-end”) source, while the microphone is used to capture audio signals from a local (“near-end”) source. In the case of a telephone call, for example, the near- and far-end sources may be people engaged in a conversation, and the audio signals may contain speech. An acoustic echo occurs when the far-end signal emitted by the loudspeaker is captured by the microphone, after undergoing reflections in the local environment.
An acoustic echo canceller (AEC) may be used to remove acoustic echo from an audio signal captured by a microphone, in order to facilitate improved communication. The AEC typically filters the microphone signal by determining an estimate of the acoustic echo, and subtracting the estimate from the microphone signal to produce an approximation of the true near-end signal. The estimate is obtained by applying a transformation to the far-end signal emitted from the loudspeaker. The transformation may implemented using an adaptive algorithm such as least mean squares, normalized least mean squares, or their variants, which are known to persons of ordinary skill in the art.
An AEC may perform echo cancellation in the time domain and the frequency domain. When performing the echo cancellation in the time domain, an AEC typically performs a convolution operation on the output signal with respect to filter coefficients. When performing the echo cancellation in the frequency domain, an AEC first typically obtains a frequency-domain representation of the output signal, which may be obtained by performing a Fast Fourier Transform (FFT) operation on the output signal. The frequency-domain representation of the output signal generally includes a magnitude and phase value for each frequency bin in the FFT. The FFT operation may be performed on the output signal to obtain a frequency-domain representation comprising any number of frequency bins. For example, the frequency-domain representation of the output signal may include 256 frequency bins. For each frequency bin in the frequency-domain representation of the output signal, the AEC may modify the signal by multiplying the values of the frequency bins with weights. The result of the multiplication process is a modified filtered output signal. To obtain the modified output signal in the time domain, an inverse Fourier transform may be performed on the frequency-domain filtered output signal.
The adaptive transformation relies on a feedback loop, which continuously adjusts a set of coefficients that are used to calculate the estimated echo from the output signal. Different environments produce different acoustic echoes from the same output signal, and any change in the local environment may change the way that echoes are produced. By using a feedback loop to continuously adjust the coefficients, an AEC can adapt its echo estimates to the local environment in which it operates.
The feedback-based adaptation scheme works better in some situations than in others, so it may be beneficial to increase or decrease the rate of adaptation in different situations. The rate of adaptation may be controlled by adjusting a parameter referred to as “step size.” A larger step size will increase the rate of adaptation, and a smaller step size will decrease it.
When adaptation is first initiated, a relatively large step size is desirable because it will allow the AEC coefficients to quickly converge on a good approximation of the actual echo produced by the local environment. Once the AEC has converged, however, a smaller step size may be more desirable. With a smaller step size, any adjustments to the AEC coefficients will be less abrupt, and the feedback loop will therefore be less susceptible to disruptive inputs like background noise and double talk, which occurs when the output signal and a local sound source are both simultaneously active.
As described above, existing methods of filtering the output signal in the frequency domain utilize weights. These weights have difficulty estimating echo arising in more complex instances of harmonic distortion. In addition, these weights cannot account for time delay in the echo estimates. Accordingly, the capability of the weights is limited.
In addition, existing step size control schemes assume that the step size is the same for each frequency bin. Accordingly, if the step size control scheme employs a relatively small step size because of disruptive inputs like background noise and double talk, the same small step size will be used for all frequency bins, even if the disruptive input is not present in all frequency bins. Using a smaller step size for the frequency bins not affected by the disruptive input provides a sub-optimal rate of convergence for the AEC coefficients corresponding to such frequency bins.
Although these problems have been framed in reference to an audio-based communication system, the same problems may be encountered in any field in which echo cancellation is performed. For example, measurement of echo is a task performed in gigabit internet applications, which employs a higher-frequency reference signal than used in audio applications. The disclosure described herein is equally applicable to any such fields.