1. Field of the Invention
The invention concerns an echo canceller with a filter that adaptively updates a set of coefficients representing an estimated impulse response of an echo path, and subtracts the anticipated echo from a transmitted signal.
2. Prior Art
Echo cancellers are useful in duplex audio communications, speakerphone and hands-free apparatus and in other situations wherein an audio signal may be coupled from the output audio stages back to the input audio stages at one or both ends of a bidirectional communications path. Such coupling can be due to electrical circuit coupling or due to an acoustic path between an audio speaker back and a nearby microphone of a telephone handset or a desktop speakerphone or another communications device.
Echo can be experienced at either or both ends of a communication path. Both ends might have echo cancellers that operate independently. The existence of an echo path at one of the ends is perceived as an echo by the party speaking and listening at the opposite end.
For providing a frame of reference, echo cancellation can be discussed with respect to transceiver equipment at a “near” end, namely the end with the echo path, and the end with which the echo canceller is associated. The echo canceller associated with the near end is intended to prevent an echo from being perceived by a party speaking and listening at the “far” end.
The acoustic and/or electric echo path at the near end causes a delayed and typically-attenuated representation of the audio input signal originating at the “far” end microphone to be echoed back from the near end to the far end and to be heard in the far end audio speaker after a delay. There can be plural audio signal paths affecting the echo, for example reflection from respective walls in a room that are at different distances from a speakerphone.
An echo canceling apparatus is provided, typically disposed in the equipment at the near end but also potentially as an intervening element or process in network communications. The echo canceller senses the presence of an echo component by detecting correlation between the incoming signal received from the far end and the outgoing signal sent from the near end, at a time lag. The echo canceller adaptively characterizes the transfer function of the echo path (or paths), applies the incoming signal to the transfer function to predict the echo component that the far end party's speech is likely to produce, and subtracts away the predicted echo from the signal before that signal is sent from the near end to the far end.
More specifically, the echo path is modeled using an adaptive filter to develop a numerical characterization of the impulse response of the echo path. An “impulse” is a theoretical pulse of infinite amplitude and zero time duration. A theoretical impulse is considered to produce an echo response characterized as a list of amplitudes (known as coefficients), finite in number, at successive time sample points following the impulse. The filter is termed a finite impulse response filter (“FIR”) because the number of filter coefficients, and the time period the coefficients encompass, are limited.
The estimate of the expected echo signal is continuously generated by applying the instantaneous value of the audio signal to the impulse response filter. Each instantaneous audio signal value, or sample, produces a list of echo component values that are predicted to result, at sample times extending into the future. As each successively sampled instantaneous value is applied to the impulse response filter, the resulting echo component values at subsequent times are added to the echo component values that were predicted to result from earlier instantaneous values. The accumulated estimated echo component values are subtracted from the audio signal that is being transmitted from the near end to the far end.
Insofar as the adaptive filter coefficients prove to be inaccurate, residual echo remains in the signal. Residual echo can be detected by the correlation between the audio received at the near end, versus the audio sent from the near end less the predicted echo. The residual echo is used as an error value in a feedback control loop that causes the filter coefficients to be adjusted. Over time, the filter coefficients home in on an accurate characterization of the echo path. The coefficients are said to “converge.” Ideally, convergence is quick and accurate, leaving virtually no echo in the signal sent back from the near end to the far end.
A new convergence may become necessary if the situation changes. For example, an echo path affected by a speakerphone at the near end may be changed if the speakerphone is physically moved or if acoustically reflective structures are moved near the speakerphone.
The echo component (and the residual echo) can be identified and measured in the cross correlation of the audio received from the far end and the audio sent from the near end, due to the echo path. There may be plural echo paths with distinct lag times, but echo is synonymous with correlation at some lag time(s) due to such paths. Minimizing residual echo might be conceived as adjusting the filter coefficients to eliminate echo-related correlation. But there is a problem. There is substantial correlation between the audio received from the far end and the audio sent from the near end, which correlation is not due to echo. For example, normal human speech has a high level of autocorrelation over the time spans of interest. Anomalies occur, such as opposite-end parties speaking at the same time (“double-talk”), that give rise to momentary cross-correlation that is not echo. If the controls that attempt to adjust and converge the adaptive filter coefficients are highly responsive, which seems advantageous to converge the coefficients quickly, then the controls will respond to autocorrelation and to non-echo correlation, rendering the filter coefficients inaccurate or actually increasing the time needed to converge.
As a technique to improve the rate at which the impulse response filter converges on an accurate characterization of the echo path, the received audio signal from the far end and the outgoing audio signal from the near end can be processed to reduce their inherent autocorrelation and non-echo cross-correlation, before attempting to detect cross correlation that may represent echo. In one technique, these signals are “pre-whitened” using a filter to remove components of the audio signal that may correlate for reasons other than echo. This reduces the signal to noise ratio of the signals input to the process or device that assesses correlation, but the rate of convergence actually is improved because correction of the filter coefficients is driven more strongly by the echo component and less strongly by the confounding factors of inherent autocorrelation and cross correlation in speech signals.
The adaptive circuits that converge the impulse response definition coefficients employ an error correction algorithm. A high or continuous degree of cross correlation of the audio signal being transmitted and a component of the audio signal received over the echo path after a time lag, indicate the presence an error to be minimized by adjusting the impulse response coefficients. The algorithm may make stepwise corrections in an amount related to the magnitude of detected error, to speed convergence. The algorithm may be arranged to suspend making corrections when a double talk situation is detected, to avoid making changes that actually degrade the accuracy of already converged coefficients. Some echo cancellers generate plural sets of impulse response filter coefficients, and switch back and forth to use the set of filter coefficients that is found to result in the least residual echo.
The error correction algorithm for the near end transceiver needs to respond to cross-correlation at a time lag, between the signal received from the far end and the signal sent from the near end, when the cross-correlation is due to echo. A challenge is presented in the fact that human speech inherently contains substantial autocorrelation (namely correlation of a given signal to itself at points spaced in time), and also cross-correlation independent of echo, such as cross-correlation between the audio characteristics of the speech of different speakers).
In theory, convergence is quickest when conducted without the influence of inherent forms of correlation of audio speech signals. An adaptive filter might converge most quickly if the input signals (the near end transmit signal and the receive signal from the far end) are not cross-correlated and have low autocorrelation. There is no such correlation between audio signals that are broadband white noise. Correlation of speech signals may be low if the speech contains fricative sounds (e.g., hiss, “th” or “sh” sounds, etc.). Normal voiced speech sounds correlate inherently.
A known technique intended to improve the rate of convergence of an echo canceller, filters the audio signals by pre-processing to select for attributes of the signals that most resemble broadband noise. These attributes do not have the high levels of autocorrelation or cross-correlation inherent in the original speech signals. U.S. Pat. No. 4,697,261—Wang et al., the teachings of which are hereby incorporated, discloses the step of pre-whitening the received speech signal to improve the rate of convergence, citing S. Yamamoto et al., “An Adaptive Echo Canceller with Linear Predictor,” Trans. IDE Japan, 1979, pp. 851-857 and international application PCT/US85/02168 (WO 86/02726). Pre-whitening is accomplished using a speech analysis and speech synthesis unit as implemented in a digital signal processor according to WO 86/02726. The received audio signal is applied to a spectral analysis unit that produces linear prediction coefficients substantially characterizing the frequency components of the signal, and residual values (“variances”) representing the differences between the actual values of audio signal samples and the values that would have been predicted by the linear prediction coefficients.
The linear prediction coefficients represent the redundant aspects of the audio signal during a sampling interval. The residual values or variances provide a pre-whitened representation of the received speech signal from the far end because redundancies that would lead to high autocorrelation apart from echo are contained in the linear prediction coefficients, not in the residual sample values (variances). This pre-whitened receive signal is the input used for correlation with the send signal in the detection and suppression of residual echo. In order to exploit this technique, it is necessary to provide a digital signal processor embodying a speech analysis unit devoted to producing the pre-whitened version of the receive signal as described. This solution is expensive and complex.
It would be advantageous to de-correlate the send and receive signals so as to improve the speed of convergence as described, but to do so in a manner that is less complicated and expensive while remaining effective. These objectives are difficult to achieve if the already complex echo canceller adaptive filter elements also need to include a digital signal processor devoted to pre-whitening the received signal. The objectives are likewise difficult if the transceiver unit is based on a processor because a substantial portion of available processing capacity may be devoted to the pre-whitening function. What is needed is a better way to separate the transmitted near end audio signal and the received far end audio signal into constituent signal components that are inherently de-correlated, and to use the de-correlated components to control the echo canceller error estimation and convergence of the impulse response filter coefficients.