In audio communication, there is a known problem of echo. Echo is particularly problematic when speakerphone functionality is used because voice data from both ends of a communication path is incident on a microphone at either end. To solve the echo problem, digital signal processing is used to subtract what is perceived by the digital signal processing to be echo related noise. To this end, converging processes have been designed to converge over time on an echoless or near echoless communication. As new processes have been designed, a time to converge and a quality of echoless communication have greatly improved.
Distinguishing echo signals from desired signals is often difficult due to double-talk situations. Double-talk occurs when two people, one on the receiving side of the network and one on the transmitting side of the network, speak simultaneously. The dual transmission and reception of signals disrupts echo canceller adaptation, and as a result, the echo canceller often performs poorly when a robust double-talk detector is not employed. When no double-talk exists, the echo canceller properly adapts its model of the echo signal path since the echo canceller only receives a signal that contains the echo. But during double-talk, the receiving side also transmits signals along a return signal path. Therefore, the echo canceller receives both return echo signals and signals transmitted from the receiving end simultaneously, and therefore may adapt improperly. Thus, a problem arises if the echo canceller erroneously adapts its model of the estimated echo signal according to a transmitted signal rather than according to an echo signal. The echo canceller may begin to distort transmitted signals.
Reliable double-talk detection is also important for echo cancellers that employ Non-Linear Processors (NLP) to improve echo canceller performance. In such architectures, a non-linear impediment is introduced in the transmit path when only the receive direction carries speech. This impediment is designed to remove residual echo (echo not cancelled by the adaptive filter), while passing the background noise from the transmitting end, or a reasonable-sounding imitation thereof. In many NLP implementations, it is necessary to turn off the NLP during double-talk, in order to allow transmitted (near-end) speech to pass unimpeded.
Reliable double-talk detection is required for various reasons, including control of residual echo, and control of adaptive algorithms used to estimate the echo. One existing technique for detecting double-talk uses changes in echo return loss enhancement (ERLE′) to distinguish between when a near-end signal is residual echo and when a near end signal comprises near-end speech. Since the echo canceller filter is for reducing the echo—far end speech received at the near-end microphone and the far end speech is affected by the room impulse response, a value E1 is determinable such thatE1=SIN/SOUT  (1)where SIN is the envelope (short-term average power) of an input signal provided from the microphone and SOUT is the envelope of an output signal provided from the echo canceller circuit. In theory, provided the echo canceller filter is reasonably well converged, then E1>1.0 (linear scale).
Obviously, in the absence of near-end speech, echo cancellation is easily evaluated. When the echo canceller has converged and there is no near-end speech, E1>>1.0, approaching infinity as noise approaches zero and the echo canceller filter approaches the RIR. This is a natural result of SOUT being in the denominator since when echo cancellation has converged without near-end speech SOUT should approach zero. As seen so far, E1 should be quite large when there is no double-talk and the echo canceller has converged.
As a result, double-talk may be determined to exist when the far-end is known to be active as determined by a voice activity detector, and hence:E1<E1Thresh  (2)where E1Thresh is a statistical value—either constant or pseudoconstant—which is established with respect to echo return loss enhancement (ERLE) by the relationship below in Equation 3.ERLE′>E1Thresh>1.0  (3)where ERLE is an estimate of average ERLE′ when there is no near-end speech, thereby making ERLE easy to evaluate. In actuality, ERLE′ varies with the signal, i.e. the far-end speech. In order to address these changes in ERLE′ either one of a long-term average, a minimum, or a recent estimate based on previous far-end speech without double-talk is often used to establish ERLE′. Many techniques for addressing changes in ERLE′ are proposed in the prior art.
Thus, in such a system the selection of E1Thresh is essential to ensure that double-talk is correctly identified. For very large values of ERLE, there are large ranges of potential E1Thresh values. However, as ERLE decreases, then E1Thresh is restricted to a much narrower range which can be problematic. In practice, this situation arises when the uncancelled echo return loss (ERL) is high—since most adaptive algorithms only cancel echo to a degree that is limited by the near-end noise floor, and hence the achieved ERLE depends on how much echo there is to cancel in the first place. High echo return losses (without cancellation) are characteristic of systems in which speakers and microphone are spatially separated from each other.
One such prior art approach is the so-called “Geigel algorithm” presented by Duttweiler (see D. L. Duttweiler, “A Twelve-Channel Digital Echo Canceller,” IEEE Trans. On Communications, Vol. COM-26, No. 5, pp. 647-653, May 1978). The Geigel algorithm presents a means to detect double talk, and compares the magnitude of the current sample of reference echo with a current value of the input signal. If the magnitude of the reference echo is least −6 dB higher than the input signal, then double-talk is determined to be present. The Geigel algorithm is simple and fast. However, when the magnitude of the reference echo is lower than −6 dB higher than the input signal during double talk, the Geigel algorithm fails to detect the double-talk. The Geigel algorithm is also sensitive to near-end noise interference.
Yet another method for detecting doubletalk is outlined in U.S. Pat. No. 6,944,288 (Seibert, “Double-talk and Path Change Detection using a Matrix of Correlation Coefficients”, hereinafter Seibert). Seibert teaches a process of generating matrix coefficients using zero-lag auto-correlation and cross-correlations from signals commonly found within echo cancellers. From these double-talk and path changes are then detected using matrix operations such as determinants, eigendecompositions, or singular value decompositions. In a preferred embodiment, the determinant of the correlation-based matrix is compared against predetermined threshold values. Seibert whilst improving over the Geigel algorithm is a processor intensive approach using matrix calculations.
It would be advantageous to provide a straightforward and effective method of detecting doubletalk.
It would be advantageous to provide a voice communication system that provides robust and effective doubletalk detection even when ERLE and/or ERLE′ has a low value—for example, for systems in which the ERL is high.