The invention relates to echo cancellation and more particularly to an echo cancellation method and apparatus with robust double talk detection and recovery for use with an automatic speech recognition (ASR) system.
In a pure digital communication network there is no echo. Invariably though in the network between an ASR and an end user, there are some digital-to-analog conversion points (also known as hybrids) that prevent the communication network from being purely digital, and these digital-to-analog conversion points are discontinuous and are sources for echoes. Echo cancellers are used to suppress such echoes, as described in U.S. Pat. No. 5,664,011 to Crochiere et al. But such echo cancellers have difficulty in the presence of double talk. Double talk occurs when electric signals corresponding to speech or talk are input to more than one station of a multiple station call. For example, double talk occurs if there are two parties to a call with each party located at a different end of a communication network and both parties talk at the same time. The difficulty with echo cancellers, such as those described in Crochiere et al., is that the coefficients of the adaptive filter used in the echo canceller tend to diverge rapidly in the presence of double talk, thereby causing distortion and introducing artifacts or producing spurious modulation frequencies. Such artifacts can negatively affect the intelligibility of the call, especially if one of the parties is an automatic speech recognition (ASR) system or similar system. If both parties are human, the typical response is to stop, wait for the double talk and divergence to end and then repeat unclear or garbled parts of the conversation.
Some attempts to prevent the difficulty of coefficient divergence consisted of freezing the value of the coefficients of the adaptive filter of the echo canceller when double talk is detected. One reason that does not solve the difficulty of diverging coefficients of the echo canceller adaptive filter is that detection of the start of double talk takes time and the deleterious divergence of the coefficients may already have taken place before the start of double talk is detected.
For an interactive ASR system, an audible system prompt is fed to the end-user and he or she, in turn, speaks back to the ASR system. The user input is corrupted with an additive echo which results from the reflection of the system prompt due to the presence of one or more hybrids in the network. These resulting echoes need to be cancelled prior to performing automatic speech recognition (ASR). Not doing echo cancellation prior to ASR would very likely result in the system prompt echo falsely triggering the recognition system. Doing classic echo canceling causes spurious artifacts which can and do cause speech recognition errors.
There is also a need in the echo cancellation art for non-diverging echo cancellers, especially for use with ASR systems.
Thus, there is a need in the art for an echo canceller that is adaptive yet well behaved in the presence of double talk.
Briefly stated, the aforementioned shortcomings of the echo canceling art are addressed and an advance in the art achieved by providing a robust method to detect and operate an echo canceling system in the presence of double talk. This robust method operates even under conditions when the strength of the echo of the audible sound at a first input is high and comparable in magnitude to a user""s input speech at a second input.
In accordance with one embodiment of the invention, the aforementioned shortcomings are addressed and an advance in the art achieved by providing a system that guards against echo canceling adaptive filter coefficient divergence upon detection of double talk by substituting a previous set of coefficients from storage for a set of echo canceling adaptive filter coefficients that were adapted in the presence of double talk.
In accordance with another embodiment of the invention, the aforementioned shortcomings are addressed and an advance in the art achieved by providing a system that guards against coefficient divergence upon the earliest detection of double talk by substituting for an adapted set of echo canceling adaptive filter coefficients, a stable set of echo canceling adaptive filter coefficients that has provided a best Echo Return Loss Enhancement (ERLE).