The present invention is directed to a method and apparatus for performing double-talk detection, and more particularly, to a method and apparatus for performing double-talk detection with adaptive decision thresholding.
Communications usually include at least two parties and associated hardware. With respect to one set of hardware, the speech from the party co-located with the hardware is termed near-end speech and the speech from the other party is termed far-end speech. Most conventional echo cancellers (which may be used with both sets of hardware) use an adaptive filter to estimate echo path and synthesize an estimated echo signal that is subtracted from a signal Sin, in order to reduce the near-end echo. FIG. 1 illustrates a conventional echo canceller 10, including an adaptive FIR filter 12, which performs a normalized least mean square (NLMS) algorithm, a double-talk detector 14, which performs speech detection and comparison and a hybrid 16. In order to correctly estimate the actual echo path from the input (Rout of the echo canceller 10, usually the same as the echo canceller 10 Rin signal) and output (Sin of the echo canceller 10) signals, the output of the echo path must originate solely from the input signal. The adaptive FIR filter 12 is easily modified to estimate the echo path if the near-end and the far-end parties speak one at a time. When both parties speak simultaneously, this situation is termed xe2x80x9cdouble-talkxe2x80x9d. During double-talk, the output signal contains not only the echo of the input signal, but the near-end speech signal as well.
When near-end speech is present, the adaptation of the filter 12 should be inhibited, otherwise an erroneous estimate of the echo path is obtained, which results in poor echo cancellation. The role of the double-talk detector 14 is to sense when the echo is corrupted by near-end speech and then inhibit the adaptation of the filter 12. Due to the divergent problems during double-talk situations, the double-talk detector 104 has a large impact on the overall performance of the echo canceller 10.
Numerous attempts have been made to perform double-talk detection which exploit the spectrum characteristic or the power level information derived from the near-end and far-end signals. For example, the conventional Geigel algorithm as described in D. L. Duttweiler, xe2x80x9cA Twelve-Channel Digital Echo Canceller,xe2x80x9d IEEE Trans. Commun., Vol. COM-26, pp. 647-653, 1978, which follows the power comparison concept, makes the basic assumption that echo has a much lower power level than the far-end speech signal. Therefore, if the near-end signal power is lower than the far-end speech by a certain threshold (usually 6 dB), the near-end signal is considered echo and the echo canceller tries to cancel it. Otherwise, double-talk is declared and adaptation is prohibited. The Geigel algorithm is very efficient (simple and low computation cost) and fairly effective (adequate for most applications).
However, the basic assumption of the Geigel algorithm is not true in the following cases:
(1) the near-end speaker is speaking with lower volume or excessive loss is introduced in the near-end analog circuits; and
(2) a large volume echo may occur in a mobile or hands-free phone or in some hybrids with severe leakage.
In these cases, the echo canceller may mistake the lower near-end speech as echo and try to cancel it, or mistake the strong echo as the near-end speech and try to keep it.
Another class of double-talk algorithms is the cross-correlation or coherence-based algorithms (denoted here as xe2x80x9cCORR-algorithmsxe2x80x9d), as described in, for example, J. Benesty et al., xe2x80x9cA New Class of Double-Talk Detectors Based on Cross-Correlation,xe2x80x9d IEEE Trans. Signal Processing, Vol. 46, No. 6, June 1998 and T. Gansler et al., xe2x80x9cA Double-Talk Detector Based on Coherence,xe2x80x9d IEEE Trans. Commun., Vol. 44, pp. 1421-1427, November 1996, which are based on the assumption that speech signals from different parties are independent through the call, and then use a cross-correlation coefficient vector between the Rout and Sin signals for double-talk detection. Since echoes can usually be approximated as an attenuated and delayed version of their original signals, strong correlation between echoes and their originates should exist. This makes the cross-correlation coefficient vector an efficient measurement for double-talk detection. Compared to the Geigel Algorithm, the CORR-algorithms introduce an extra decision delay of at least one speech frame (usually several hundred samples) in order to reliably estimate the cross-correlation functions. As a result of the lag decision, adaptation also must be delayed in order to avoid severely canceling the initial part of the break-in near-end speech. The CORR-algorithms also are much more computational complex, especially when estimating a coherence function in the spectrum domain.
Other attempts to resolve the double-talk problem can be found in K. Ochiai et al., xe2x80x9cEcho Canceller with Two Echo Path Models,xe2x80x9d IEEE Trans. Commun., Vol. COM-25, pp. 589-595, June 1977, which uses an echo canceller with two echo path models, or in C. Carlemalm et al., xe2x80x9cOn Detection of Double-Talk and Changes in the Echo Path Using a Markov Modulated Channel Model,xe2x80x9d Proc. Intl. Conf. ASSP, Munich, Germany, Apr. 20-24, 1997, Vol. V, pp. 3869-3872, which uses a Markov modulated channel model.
Each of the above-described detection techniques have at least one common feature; namely a suitable precision threshold is critical, due to the time varying properties of the speech levels, the background noise, and the attenuation of the echo path.
This suggests that a fixed decision threshold is not appropriate and should be replaced by an adaptive decision threshold which is capable of continuously tracking variations during the calls. Furthermore, the parameter estimation and double-talk detection algorithms must be fast in order to prevent the synthesizing filter in the echo canceller from diverging.
The present invention solves the problems with conventional double-talk detectors and echo cancellers, by providing a double-talk detector and a method of performing double-talk detection, as well as an echo canceller and a method of performing echo cancellation, which utilizes an adaptive threshold. The adaptive threshold is capable of continuously tracking variations during a telephone call, and permits the double-talk detector, echo canceller, and methods of the present application to adjust to the time varying properties of speech levels, background noise and/or the attenuation of the echo path.
In another preferred embodiment, the present invention permits the use of two or more, complementary double-talk detection algorithms. For example, one of the double-talk detection algorithms could be a detection algorithm, such as the Geigel algorithm, which is simple and has low computational cost, and is fairly effective, and the other could be a cross-correlation or coherence-based algorithm, which may be more accurate, but also more computationally complex.
In another embodiment of the present invention, the double-talk detector, echo canceller, and methods of the present application, include processing elements which are frame-based, sample-based, or a combination of both.