Echo significantly degrades the sound quality of telephony voice communications. In telephony voice communications between two parties, i.e., a far-end speaker and a near-end speaker, the far-end speaker perceives echo effects if his voice signals are transmitted back to him. Two types of echoes can be differentiated, according to the place where they are created: acoustic echo, due to a coupling between the loudspeaker and the microphone on the near-end side (e.g., with a handsfree telephone set such as in a car), and electric—or line—echo, due to a line impedances mismatch during the 2-to-4 wire conversion at the switching station. Therefore, echo cancellation is important to preserve a good communication quality. Furthermore, with the development of VoIP telephony, which increases end-to-end transmission delays, echo cancellation becomes mandatory (VoIP=Voice over Internet Protocol).
The task of an echo canceller is to mimic the echo signal, thus providing an estimate of the echo signal, and remove this estimate from the signal combining near-end signals and echo of the far-end signals, yielding a residual signal ideally consisting of only the near-end speaker signal. The estimated echo signal is also known as echo path estimate. For the modelling of the echo signal, most commonly an adaptive finite impulse response (=FIR) filter is used.
However, the performance of the echo canceller drastically deteriorates during double-talk periods in which signals from both the near-end and far-end speakers coexist. As the large component of the near-end speech distorts the output signal, the filter coefficients determined by the echo canceller deviate from their converged state. Consequently, the error between the real echo signal and its replica generated by the filter increases. The major target of a double-talk detection (=DTD) module is to detect phases when the far-end speaker and the near-end speaker are talking at the same time and suspend the echo estimation during these phases in order to prevent a filter divergence.
The existing solutions for DTD can be classified into three groups. First, energy-based algorithms, such as Geigel DTD or robust Geigel DTD, which require few MIPS but are not very efficient (MIPS=Million Instructions per Second). Second, correlation-based algorithms such as the cross-correlation or the coherence methods which are quite efficient provided the echo-to-noise ratio is high, but these methods require extensive memory storage and involve a high computational complexity. Third, methods based on the echo path estimate filter evolution control have a higher performance than the Geigel DTD method but are nearly as computationally complex as correlation-based methods.
Currently, there is no solution that provides a good trade-off between efficiency and complexity. Moreover, the present solutions do not provide for the case of a detection miss, i.e., an actual double-talk situation not detected by the algorithm. Thus, adaptation will needlessly be allowed for the whole filter length and therefore leads to divergence of the entire filter.