1. Field of the Invention
The present invention relates generally to echo cancellation and control in communication networks. More particularly, the present invention relates to methods and systems for delay estimation, double talk detection and echo path change detection for echo cancellation and control.
2. Background Art
Subscribers use speech quality as the benchmark for assessing the overall quality of a telephone network. A key technology to provide a high quality speech is echo cancellation and control. Echo canceller performance in a telephone network, either a TDM or packet telephony network, has a substantial impact on the overall voice quality. An effective removal of hybrid and acoustic echo inherent in telephone networks is a key to maintaining and improving perceived voice quality during a call.
Echoes occur in telephone networks due to impedance mismatches of network elements and acoustical coupling within telephone handsets. Hybrid echo is the primary source of echo generated from the public-switched telephone network (PSTN). As shown in FIG. 1, hybrid echo 110 is created by a hybrid, which converts a four-wire physical interface into a two-wire physical interface. The hybrid reflects electrical energy back to the speaker from the four-wire physical interface. Acoustic echo, on the other hand, is generated by analog and digital telephones, with the degree of echo related to the type and quality of such telephones. As shown in FIG. 1, acoustic echo 120 is created by a voice coupling between the earpiece and microphone in the telephones, where sound from the speaker is picked by the microphone, for example, by bouncing off the walls, windows, and the like. The result of this reflection is the creation of multi-path echo, which would be heard by the speaker unless eliminated.
As shown in FIG. 1, in modern telephone networks, echo canceller 140 is typically positioned between hybrid 130 and network 150. Generally speaking, echo cancellation process involves two steps. First, as the call is set up, echo canceller 140 employs a digital adaptive filter to adapt to the far-end signal and create a model based on the far-end signal before passing through hybrid 130. After the near-end signal including the echo signal, passes through hybrid 130, echo canceller 140 subtracts the far-end model from the near-end signal to cancel hybrid echo and generate an error signal. Although this echo cancellation process removes a substantial amount of the echo, non-linear components of the echo may still remain. To cancel non-linear components of the echo, the second step of the echo cancellation process utilizes a non-linear processor (NLP) to eliminate the remaining or residual echo by attenuating the signal below the noise floor.
SPARSE echo cancellers employ adaptive filter algorithms with a dynamically positioned window to cover a desired echo tail length, such as a sliding window, e.g. a 24 ms window, covering an echo path delay, e.g. a 128 ms delay. To properly cancel the echo, the echo canceller must determine a pure delay or a bulk delay, which is indicative of the location of the echo signal segment or window within the 128 ms echo path delay. If the bulk delay is not determined accurately, not only the echo signal is not properly cancelled, but also the echo canceller further distorts the signal by performing the echo cancellation at a wrong place. Therefore, it is crucial that the bulk delay is determined accurately.
Because the echo canceller is utilized to cancel an echo of Rin signal 141 from Sin signal 132, presence of speech signal from the near end would cause the adaptive filter to converge on a combination of near end speech signal and Rin signal 141, which will lead to an inaccurate echo path model, i.e. incorrect adaptive filter coefficients. Therefore, in order to cancel the echo signal, the adaptive filter should not train in the presence of the near end speech signal. To this end, conventional echo cancellers analyze Sin signal 132 and determine whether it contains the speech of a near end talker. By convention, if two people are talking over a communication network or system, one person is referred to as the “near talker,” while the other person is referred to as the “far talker.” The combination of speech signals from the near end talker and the far end talker is referred to as “double talk.” To determine whether Sin signal 132 contains double talk, a double talk detector estimates and compares the characteristics of Rin signal 141 and Sin signal 132. A primary purpose of the double talk detector is to prevent the adaptive filter from adapting when double talk is detected.
If the double talk detector does not accurately determine the existence of a double talk condition, the adaptive filter improperly trains on a signal that includes a near end signal, and the adaptive will not accurately model the echo signal. Conversely, if the double talk detector does not accurately determine non-existence of a double talk condition, the adaptive filter does not train on Rin signal 141 and the adaptive will not accurately model the echo signal.
Conventional methods for determining the bulk delay and detecting the double talk condition suffer from many disadvantages. For example, the Geigel algorithm, which is performed in time domain, computes the correlation between Rin signal 141 and Sin signal 132. The Geigel algorithm estimates the bulk delay when the correlation between Rin signal 141 and Sin signal 132 is high and determines that a double talk condition exists when the correlation between Rin signal 141 and Sin signal 132 is low. The Geigel algorithm, however, suffers from poor performance in noisy conditions, because it is merely based on energy detection. Further, the Geigel algorithm does not properly detect double talk in the event of embedded near end signal with small amplitude, and also falsely detects double talk when none exists.
Other conventional methods for determining the bulk delay and detecting the double talk condition use a spectral approach to perform full or sub bandwidth matching based on FFT (Fast Fourier Transform). These conventional approaches also suffer from several drawbacks, such as being impacted by the echo path and poor performance against noise.
Accordingly, there is a need in the art for more accurate determination of the bulk delay and detection of the double talk condition in echo cancellation and control systems.