The present invention is directed to an echo canceller for improving the quality and network capacity of fixed-wireless networks using packetized low-rate coded voice and digital speech interpolation.
In a telephone network, four-wire and two-wire segments are joined at opposite ends of the network by hybrid circuits. However, unmatched impedance at the connection points between a two-wire segment and the four-wire segment causes a portion of an input signal on the receiving side to leak to the transmitting side via the hybrid circuit 3. The signal portion that leaks to the transmitting side is generally referred to as an echo.
Echoes are one of the primary factors affecting the perceived quality of voice connections. In telephony networks using low-rate packetized speech coding and digital speech interpolation, network delays are large, and therefore, echoes are easily perceptible, annoying and need to be suppressed. In addition, packet voice networks use digital speech interpolation (DSI) to improve the networks' traffic carrying capacity. In such networks, a transmission channel is allocated to a user only when its speech is active. Thus, DSI enables a larger number of users to share the network than the number of available transmission channels. The channel allocation is controlled by a voice activity detection (VAD) device which determines presence of voice on the channel. However, a VAD device does not distinguish between speech in the form of voice from a near-end speaker or speech in the form of an echo from a far-end speaker. Hence, the presence of echoes will trigger a VAD device to allocate transmission channels needlessly. Thus, the presence of echoes in DSI networks is undesirable not only because they are annoying, but also because echoes waste a network's traffic carrying capacity.
Network echo cancellers are conventionally used to eliminate the echoes caused by impedance mismatches in the echo path formed by hybrid circuits. Generally, such echo cancellers utilize an adaptive transversal filter that monitors an incoming speech signal from a far-end speaker and models a linear impulse response using a coefficient adaptation algorithm to replicate the parameters of the actual echo expected in the outgoing signal. The replicated signal is subtracted from the outgoing signal to cancel the actual undesired echo portion of the outgoing signal. The remaining signal is fed back to the adaptive filter and used to update the replicated signal. This feedback loop allows the adaptive filter to converge to a close approximation of the echo parameters. However, when both incoming (speech from a far-end speaker) and outgoing (speech from a near-end speaker) signals are present at the same time, the adaptive filter is no longer able to effectively cancel the undesired echo signals. This is because the echo signal is included with the near-end speech signal in the outgoing signal, which causes the remaining signal fed back to the adaptive filter to increase and disturb the updating process.
When both near-end and far-end speakers are talking, the condition is termed "doubletalk." Accordingly, it is current practice to provide a double-talk detector to detect double-talk and to terminate the updating process of the adaptive filter to prevent the echo cancellation from being undesirably lessened. A conventional double-talk detector assumes that the echo-return loss (ERL), defined as the ratio of the power of the reflected to the incident signal, is known (approximately 1/2 or 6 dB). Such a detector declares the presence of a double-talk condition when the reflected signal power is more than 1/2 of the incident power. However, the problem with this scheme is that the echo-residual loss (ERL) of a hybrid circuit 3 can vary widely from as little as 3 dB to as much as 25 dB, or more depending on the telephone set or number of sets used in tandem.
Ideally, an echo cancellor, which has perfectly converged should be able to remove all of the echo from the incoming or incident signal. However, conventional practical echo cancellers are typically able to remove 36-40 dB of the echo signal. Although 36-40 dB of attenuation is substantial, it is not enough to remove all of the perception of the echo thereby. As a result, a residual echo remains in the outgoing signal. This limitation of practical echo cancellers may be due to their inability to model the non-linear distortions in the echo path or the fact that a speech signal has periodicities, and adaptive algorithms, such as a least-mean square (LMS) algorithm used in echo cancellers, have poor performance for signals with periodicities. As a result of the limitations of the echo estimation process, a non-linear processor is conventionally employed to further remove residual echoes. Conventional non-linear processors have a center-clipper transfer function in which digital samples of speech lower than a certain value are squelched to zero.
However, in telephony networks using low-rate packetized speech coding and digital speech interpolation, a center-clipper non-linear processor is not desirable because low-rate compression algorithms are sensitive to any non-linearity in the outgoing signal. The non-linear processing causes an on-off effect, which is undesirable for processing by low-bit rate speech encoders, since this causes a degradation in the quality of the encoded speech.
A discussion of echo cancellers using adaptive filtering techniques can be found in "Digital Voice Echo Canceller with TMS32020," D. Messerschmidt et al. in Digital Signal Processing Applications with the TMS320 Family, Theory, Algorithms and Implementations, Vol. 1, 1989.
It would be advantageous to provide an echo canceller that terminates its adaptive updating process in a double-talk condition wherein a double-talk condition is detected by accurately estimating the echo-residual loss (ERL), rather than assuming a value of the echo-residual loss (ERL). Also, it would be advantageous to avoid the use of a center-clipper non-linear processor to remove residual echo signals.