Acoustic echo, which is the direct result of acoustic coupling between the microphone and speaker is the main source of distortion in hands free telephony systems.
To eliminate the echo while maintaining a full duplex communication, most echo cancelers use an adaptive filter to identify the acoustic path between the microphone and speaker and based on this identified path an estimate of the acoustic echo is subtracted from the microphone signal. Note that due to limited DSP engine resources (memory and MIPS) the size of adaptive filter is usually smaller than the actual size of the acoustic echo path and an exact estimate of acoustic echo cannot be made. Also in real environments, due to noise, non-linearity in echo path etc, the performance of linear adaptive echo canceler will be even more limited. As a result of all these effects linear adaptive echo cancelers cannot cancel echo completely and there always be some remaining echo residual that can be heard by the far-end listener.
To improve upon this limitation, a common approach is to use a non-linear process (NLP) at the output of the adaptive filter to further suppress any remaining echo residual. Since NLP can also suppress the near-end talker's voice, ideally NLP should be active only when far-end talker is active. During double talk periods, when both near-end and far-end talkers are speaking at same time, NLP should be turned off to prevent clipping the near end talker's voice. Also during double talk periods, adaptation of the adaptive filter needs to be frozen to prevent it from diverging.
Because of all above double talk detectors play an important role in acoustic echo cancelers. Ideally a double detector should only detect the condition when both near-end and far-end input signals are present at same time. In practice, under certain conditions, double talk detector may miss a double talk condition or may falsely detect a non-double talk situation (for example when only far-end signal is present). Note that false double detection prevents NLP to activate and adaptive filter to track any path changes. Both of these will result in noticeable increase in echo residual.
Most common double talk detection schemes rely on power differences or correlations between near-end and far-end signals to detect a double talk condition. Most of these methods also assume that near-end signal has higher power than the returned echo. Although this may be true for some applications but when microphone and speaker are acoustically highly coupled, and for high speaker volume the returned echo level can be much higher than the near-end signal. Under these conditions most double talk detector will either fail to detect the double talk or falsely detect it.
Some prior inventions try to resolve this problem in frequency domain by detecting signal energy in upper band spectrum of near end input. Main disadvantage of these methods is that its performance relies on the spectrum of the near end speech signal and it can fail if there is no signal energy in higher frequency bands which can happen for certain speech signals.