This invention relates to echo cancellers used in the public switched telephone network (PSTN).
In the public switched telephone network, echo signals arise from impedance mismatches. Despite care in impedance matching, central office hybrids (which convert between 2-wire subscriber lines and the 4-wire circuits used for inter-office transmission) give rise to residual echo even when the end-to-end circuit delay is only moderate. In order to achieve more complete echo suppression, echo cancellers are employed at each end of the 4-wire transmission path between remote hybrids. An echo canceller has one of its legs positioned to receive the signal incoming from the distant end of the 4-wire path and to pass it through unchanged to its adjacent hybrid""s receive leg.
To overcome the echo, the echo canceller adaptively adjusts the coefficients of a finite impulse response (FIR) filter to model the echo path so that an estimate of the echo can be subtracted from the signal being returned to the distant end. The echo canceller adapts its filter based upon a comparison of the far end signal and the echo of the far end signal injected into the transmit leg by its adjacent hybrid. The adaptation process is can be a stochastic gradient step method which uses a rough (noisy) estimate of the gradient, g(n)=e(n)xc3x97(n), to make an incremental step toward minimizing the energy of the echo signal in the transmitted signal. This is the classic LMS process.
Once the adaptation process has adjusted the FIR filter""s coefficients sufficiently to provide a reasonable approximation of the impulse response of the echo path, the system is said to have xe2x80x9cconverged.xe2x80x9d However, when the transmit path contains speech intended to be transmitted to the far end, such speech will tend to interfere with the convergence of the filter adaptation program. Accordingly, the filter should not be adjusted from samples taken when both parties are talking; a condition referred to as xe2x80x9cdouble-talkxe2x80x9d, since such samples do not accurately represent the echo path and will cause the filter adaptation program to diverge from a correct solution. Prior art patents, such a U.S. Pat. Nos. 5,953,420; 5,606,550; 5,390,250; 5,193,112 and 4,894,820, have recognized the importance of blocking filter adaptation when double talk is detected. However, because it takes a finite time to detect the double talk condition, some near end samples containing near-end speech may have already been used to update the FIR coefficients. Use of such samples will tend to prevent the adaptation program from properly converging. It would be extremely advantageous to rapidly detect the double talk condition and, during the interval that it takes to do so, to furnish the filter adaptation program with appropriate coefficient values for the FIR filter.
Detection of the double-talk condition is further complicated when the echo path has significant delay. Under circumstances of significant delay, the echo of the far signal, as perceived at the near end input to the echo canceller, may arrive after the far signal has disappeared, for example at the end of a syllable. When this happens the echo of the far signal appearing at the near end input, when the far signal has already disappeared from the far end input, will be mistaken as a double-talk condition and halt the adaptation of the FIR filter coefficients.
Besides double-talk, there is another condition that will adversely affect adaptation. The tone signals employed in the PSTN, as well as the tones emitted by modems, have the ability to cause an echo canceller to fail to converge properly. It would be advantageous to detect such tones as well as any stationary signals such as periodic background noise that may be caused by a motor, fan, or engine that present themselves and to prevent the adaptation program from being adversely affected by them.
The typical impulse response of the echo path is of a diffuse nature whose value deteriorates with time over a period termed the echo tailxe2x80x9d. To make matters more complex, it is possible that multiple echo sources can be present in the network whose echo tails may change with time. A good echo canceller should adapt to an echo path and cancel the echoes from all the echo sources in the network within an appropriate convergence time. This requires that the number of independent echo tails be determined so that the echo path can be properly modeled for each such tail. While at first glance it would see appropriate to sample the echoes to find the largest amplitudes signals, it turns out that the largest amplitude signals that are found may not belong to independent echoes. Accordingly, it would be advantageous to be able to determine which amplitude samples belong to which echoes so that the echo path can be properly modeled.
In accordance with the principles of the invention, a xe2x80x9cfast attackxe2x80x9d method rapidly detects the onset of a double talk condition by monitoring the rate of change of near end signal amplitude and by changing the time constant used to compute its average power. Filter coefficients that may have been modified during the time it takes for the fast attack method to change the time constant are discarded, and filter coefficient values that were obtained during a previous, better converged state of the filter are substituted.
Further in accordance with the invention, not only tonal signals, but any signals which have a high degree of auto-correlation are preventing from disrupting convergence of the filter adaptation program by autocorrelating successive far end samples to obtain three autocorrelation coefficients (CORRi for i=0 . . . 2). Then, an LPC analysis is performed on the three autocorrelation coefficients to obtain two reflection coefficients RC0 and RC1 where RC0=xe2x88x92CORR1/CORR0 and RC1=(CORR2*CORR0xe2x88x92CORR12)/(CORR02xe2x88x92CORR12). A highly correlated signal tends to have a lower value of RC1 while a stationary signal should have little variation in both RC0 and RC1. The mean, MRC, and the approximate variance, VRC, are monitored to detect signals above a preset threshold.
Further in accordance with the invention, the echo tails from multiple time variant echo sources are adaptively suppressed by determining which echo amplitudes correspond to echoes from independent sources. An array CC[0 . . . (Txe2x88x921)] is formed by cross-correlating the far signal with the normalized near signal. The local maximum amplitudes are found in the CC array for every group of 16 samples in the array. A resulting Peak array (which has {fraction (1/16)} the number of samples as the CC array) is formed. The decimated (peak) array is searched under the assumption that the peak amplitude in an independent tail will be found close to the beginning of the tail. Once a peak has been identified in the decimated array, all of the elements associated with that peak are flagged so that they will not again be searched. The result of this procedure is a set of flags that indicate the areas which will be adapted by the FIR filter adaptation program.
Further in accordance with the invention, the amount of delay affects the computation of the step size used in normalization of the LMS adaptation of filter coefficients and for updating cross-correlations. When the echo delay is short, a measure of the most recent value of the inverse of the far end power (IABSY)2 is used. When there is significant delay, the square of the most recent value of IABSY is no longer a fair measure of the far end power that corresponds to the echo caused by a past component of the far end signal. Accordingly, instead of using (IABSY)2 to normalize the step size, the product of the most recent value of IABSY and a previous value of IABSY that corresponds to the echo delay is taken from the history array (IABSYH) and used to form the product IABSY*IABSYH for computing the step size.