1. Field of the Invention
The present invention relates to signal processing, and, more specifically but not exclusively, to techniques for controlling acoustic echo in telephone communication networks.
2. Description of the Related Art
As used herein, the term “acoustic signal” refers to audible sound, while the term “audio signal” refers to electronic signals, such as the electronic signals generated by a microphone receiving an acoustic signal and the electronic signals converted by a loudspeaker into an acoustic signal. If the term “signal” is used without a qualifying adjective, it should be assumed to refer to an audio signal, not an acoustic signal. In a telephone network, two types of echo may be introduced into the audio signals transmitted between user equipment at opposite ends of a telephone call: hybrid echo and acoustic echo. Hybrid echo, which is introduced by a network device known as a hybrid, occurs when there is an impedance mismatch in the hybrid or another kind of hybrid imbalance takes place, e.g. signal pickup. The hybrid imbalance allows a portion of the incoming audio signal received from the far end of the network to be reflected back to the far end. The reflected portion of the incoming audio signal is mixed with the audio signal generated by the microphone at the near end of the network, and the resulting outgoing audio signal is transmitted from the near end to the far end. Hybrid echo typically has a relatively low delay and a short echo path. Further, hybrid echo is relatively stable in terms of echo path change and echo return loss.
Acoustic echo occurs when acoustic signals, generated by the loudspeaker of the near-end user equipment based on the incoming audio signal transmitted from the far end, are picked up by the microphone of the near-end user equipment along with other acoustic signals at the near end. The resulting audio signal is transmitted to the far end. Although generated based on acoustic signals, the term “acoustic echo” refers to the portion of the electrical audio signal corresponding to those acoustic signals, not to the acoustic signals themselves.
Thus, when both hybrid echo and acoustic echo are present at the near end, the outgoing audio signal transmitted to the far end will contain contributions from both echo sources, which tend to distort the acoustic signals generated by the loudspeaker at the far end. Analogously, hybrid and/or acoustic echo present at the far end will tend to distort the acoustic signals generates by the loudspeaker at the near end.
In a typical wireless telephone network, an outgoing audio signal is compressed with a low-bitrate codec such as an adaptive multi-rate (AMR) codec. As a result, the acoustic echo in the outgoing audio signal does not have a sample-by-sample linear correlation with the source acoustic signal. In typical cases, the acoustic echo is delayed by 150 to 200 ms in jitter buffers, the network signal encoder and decoder, and at other network points. Unlike hybrid echo, which is relatively stable, acoustic echo dynamically changes when, for example, the user equipment moves toward or away from reflective objects, or the user begins or stops speaking.
The problem of controlling hybrid echo has been adequately addressed for some time now. The problem of controlling acoustic echo, on the other hand, has not been adequately addressed, particularly in the area of wireless communications systems. Individual mobile phone manufacturers are responsible for designing phones that control acoustic echo. Nevertheless, many cheap mobile phones are available that either do not adequately control acoustic echo or do not control acoustic echo at all. Mobile phones that do not adequately control acoustic echo often contain unwanted side effects such as noise clipping and background noise discontinuity. These unwanted side effects are often more adverse to speech than the acoustic echo.
Generally, there are two approaches that may be used to control echo: automatic echo cancellation and echo suppression. Automatic echo cancellation approaches commonly use linear filtering-based algorithms, such as the normalized least mean square (NLMS) algorithm, the proportionate NLMS (PNLMS) algorithm, the recursive least squares (RLS) algorithm, the affine projection algorithm (APA), or the fast affine projection (FAP) algorithm, to estimate the echo and remove the echo by subtracting it from the received signal. Some automatic echo cancellation approaches are discussed in U.S. Pat. No. 5,631,899, U.S. Pat. No. 5,146,470, D. L. Duttweiler, “Proportionate Normalized Least Mean Square Adaptation in Echo Cancellers,” I.E.E.E. Transactions on Speech and Audio Processing, Vol. 8, September 2000, pgs. 508-518, K. Ozeki and T. Umeda, “An Adaptive Filtering Algorithm Using an Orthogonal Projection to an Affine Subspace and Its Properties,” Electronics and Communications in Japan, Vol. 67-A, No. 5, 1984, Orfanidis and Sophocles J., “Optimum Signal Processing, An Introduction,” MacMillan, New York, 1985, and S. L. Gay and S. Tavathia, “The fast Affine Projection Algorithm,” ICASSP-95, 1995, the teachings all of which are incorporated herein by reference in their entirety.
In applications where acoustic echo is corrupted by non-linear codecs, the above-mentioned echo cancellation approaches are not adequate for at least three different reasons. First, implementation of these approaches is relatively complex because the echo path length is relatively large for acoustic echo (e.g., 200 ms or longer). Second, the above-mentioned linear algorithms do not adequately control non-linear echo. Third, the acoustic echo dynamically changes, and the above-mentioned linear algorithms do not converge quickly enough to account for the changes.
Generally, echo suppression approaches de-attenuate the level of echo in a received signal, without estimating the level of the echo and subtracting the estimated echo from received signal. Typical prior-art echo suppression approaches suppress echo by manipulating the amplitude of the frequency representation of the received signal, while keeping the phase unchanged. These approaches adequately control acoustic echo. However, typically, implementation of these approaches is relatively complex because they require two FFT operations for each frame of the signal. In addition, FFT-based approaches generally have a relatively significant delay due to FFT-related buffering and signal overlapping techniques that are applied to the output of a final inverse FFT operation.