In physics, echo may be defined as the replica produced by the reflection of a wave in its surrounding environment. Such a phenomenon may occur in speech telecommunications. In a telephone terminal, acoustic echo is due to the coupling between the loudspeaker and the microphone of the terminals. As a consequence, the microphone of the telephone not only contains the useful speech signal but also contains echo. If no processing is done on the microphone path, the echo signal as well as the near-end speech signals is transmitted to the far-end speaker and the far-end speaker hears a delayed version of his/her own voice. The annoyance due to hearing his/her own voice increases as the level of the echo signal is high and as the delay between the original signal and its echo is high.
In order to guarantee a good speech quality, some processing may be implemented on the microphone path before the transmission can take place. Acoustic echo cancellation algorithms have been largely investigated in the recent years. Approaches to acoustic echo cancellation may include an adaptive filter followed by an echo postfilter. The adaptive filter produces a replica of the acoustic path. This echo path estimate is then used to estimate the echo signal that is picked up by the microphone. In practice, performance of adaptive echo cancellation (AEC) is disturbed by the presence of ambient noise or/and near-end speech signal. To limit the impact of such disturbance on the AEC, double-talk detectors (DTD) and/or noise only detectors may be used.
Double talk detectors may typically be quite complex. Scenario classification algorithms may for example exploit speech presence probability and/or signal coherence. Typical use of DTD consists in freezing the adaptation of the AEC during double-talk (DT) periods (double-talk periods refer to periods during which both the far-end and near-end speakers are active). Nevertheless even with the use of DTD, some residual echo typically subsists at the output of the adaptive filter. A postfilter may be used to render echo inaudible. Echo postfilters may consist of attenuation gain applied to the error signal from the adaptive echo cancelling. For better double talk performances, this attenuation can be computed in the subband or frequency domain. Nevertheless, performances of single channel echo cancellation are still limited especially in a handsfree configuration, for which the near end to echo ratio is low. This limited performance may result in high distortions in the processed near-end speech signals during double-talk periods and therefore in bad communications quality. There may be a trade-off to be made between echo suppression during echo-only periods and low level distortion of near-end speech during DT periods. Approaches to improve the speech quality in case of low near-end to echo ratio may be based on the use of multi microphones for echo processing.
Further, multi-channel echo cancellation based on beamforming approaches may be used in order to improve the speech quality in case of low near-end to echo ratio.
Still, effective methods of echo postfiltering or echo suppression are desirable.