The present invention relates to telephony and voice applications in digital networks, and specifically to techniques for mitigating the effects of echo in such applications. More specifically, the present invention relates to techniques for speech detection and echo suppression.
In telephony applications, acoustic coupling between the speaker and microphone at the far end can result in reception of an “echo” at the near end which is annoying to the near end user and makes it difficult to communicate coherently, thus significantly undermining the efficacy of such applications. In digital network telephony, this problem is exacerbated by the relatively long delays in the transmission paths, and the typically poor acoustic isolation of the transducers used by such applications.
There are two solutions to this problem, commonly referred to as echo cancellation and echo suppression, either of which may be used alone or in combination with the other. Echo cancellation is typically implemented as an adaptive filtering algorithm in the far-end equipment and can be highly effective. Basically, echo cancellation algorithms model the process by which the echo at the far end is generated, generate an estimated echo signal, and subtract the estimated echo signal from the signal to be transmitted to the near end.
However, there are some issues which limit the universal applicability of conventional echo cancellation techniques. For example, because changes in the acoustic attenuation of various echo paths cannot be compensated for immediately, some of the echo leaks through. In addition, in the presence of large amounts of acoustic noise, the adaptive algorithm may not converge. Also, large amounts of computational resources are required for such algorithms. Finally, in order for a near-end user to derive the benefit of echo cancellation algorithms in far-end telephony equipment, the equipment at both ends must be provided by the same or cooperative vendors, an obvious limitation on the effective deployment of such techniques.
By contrast, echo suppression, which may be used instead of or in conjunction with echo cancellation, is typically implemented as an algorithm running entirely in the near-end equipment. The fundamental idea is to detect when the near-end user is speaking and, allowing for the round-trip delay of the echo signal, to significantly reduce the gain of the near-end speaker, a technique often referred to as “ducking.” Any echo that might otherwise be heard is reduced to the point where it does not interfere with the near-end user's current attempts at communicating.
Unfortunately, many currently available echo suppression techniques are relatively primitive. That is, such techniques typically detect when a near-end user is speaking and turn down the near-end speaker gain at some fixed delay from when the speech is detected. The fixed delay is typically relatively short, e.g., 200 ms, to ensure that the suppression of the near-end speaker occurs before any echo is received. In addition, the suppression typically continues well after the detected speech has ended to ensure that all of the corresponding echo has been suppressed.
The problem with such a brute force approach to echo suppression is that much more information is suppressed than is necessary, including speech from the far-end user which occurs simultaneously with the near-end speech, i.e., the so-called double talk condition. It is therefore desirable to provide echo suppression techniques that more intelligently suppress echo as well as avoid the undesirable suppression of far-end speech.