1. Field of the Invention
The present invention relates generally to two-way communications systems and, specifically, to the suppression and cancellation of echoes in two-way communications system.
2. Related Art
In a two-way communications system, such as a telephony system, echoes are distracting and undesirable. Two common types of echoes are hybrid echoes, which are typically caused by impedance mismatches in hybrid circuits deployed in central offices, and acoustic echoes where a sound produced by a speaker is picked up by the microphone.
FIG. 1 illustrates echoes in a typical telephonic system, depicted here as either landline based or mobile. During a call between telephone 102 and telephone 112, telephone 102 transmits signal 104 and receives signal 106. In addition, microphone 116 of telephone 112 picks up signal 104 produced by speaker 114 and adds it to signal 106. This is shown as acoustic echo 118. Speaker 114 and microphone 116 can alternatively be part of another type of communications device such as an external speaker and microphone attached to a personal computer providing voice over Internet protocol (VoIP) communications.
Echo cancellation is the process of removing the echo from a system by producing an estimate of the echo and subtracting it from the received signal. FIG. 2 illustrates a model for an acoustic echo that may be used to perform echo cancellation. From the perspective of telephone 112, signal 104 is a far-end signal and signal 106 is a near-end signal. Speaker 114 transmits far-end signal 104 to the user, which is echoed back to microphone 116 as an acoustic echo. Echo canceller 300 comprises echo approximator 304 and subtractor 302. Echo approximator 304 received the far-end signal 104 and approximates the echo 118 added to signal 106. Approximate echo 306 is subtracted from near-end signal 106 to produce near-end signal 308. If echo approximator 304 accurately approximates the echo 118, then subtracting approximate echo 306 from near-end signal 106 would produce a signal 308 that approximates acoustic signal 202 without the echo.
The echo cancellation of FIG. 2 is described in further detail in the signal processing diagram shown in FIG. 3. The signals illustrated are digital signals. Echo e(n) is modeled as the result of echo transfer function 306 with impulse response s(k) applied to far-end signal x(n) and added with adder 308 to near-end signal d(n) resulting in a composite signal y(n). Generally, the index n is a time index that is a discrete time variable and the index k is the sampling time index of an impulse response. Adaptive filter 304 having impulse response h(k) approximates the echo of far-end signal x(n) shown as approximate echo ê(n). Subtractor 302 subtracts approximate echo ê(n) from signal y(n) to produce signal z(n) which is an approximation to near-end signal d(n).
Mathematically, the total received signal at the microphone isy(n)=e(n)+d(n),  (1)where the echo is modeled by
                                          e            ⁡                          (              n              )                                =                                    ∑                              k                =                0                                            L                -                1                                      ⁢                                                            s                  *                                ⁡                                  (                  k                  )                                            ⁢                              x                ⁡                                  (                                      n                    -                    k                                    )                                                                    ,                            (        2        )            where s(k) is a finite impulse response of order L−1. The output of the echo canceller which is the signal transmitted by the telephone to the far-end is given byz(n)=y(n)−ê(n)  (3)where ê(n) is the estimated echo and is approximated using an adaptive linear filter of order L−1 given by
                                                        e              ^                        ⁡                          (              n              )                                =                                    ∑                              k                =                0                                            L                -                1                                      ⁢                                                            h                  *                                ⁡                                  (                                      n                    ,                    k                                    )                                            ⁢                              x                ⁡                                  (                                      n                    -                    k                                    )                                                                    ,                            (        4        )            where h(n,k) is the impulse response of the adaptive filter at time sample n. It should be noted that since the filter is adaptive it changes over time so the impulse response is also a function of time as shown.
The output of the echo canceller can also be expressed asz(n)=d(n)+ε(n)  (5)where ε(n)=e(n)−ê(n) is the residual echo. As the desired signal output signal z(n) is the near-end signal d(n) the objective of the echo canceller is to reduce the residual echo as much as possible.
One approach to adaptation of the adaptive filter is to minimize the mean squares error of the residual error ε(n). An adaptation approach known as least mean squares (LMS) yields the following adaptation equationh(n+1,k)=h(n,k)+μ(n)z*(n)x(n−k),  (6)where μ(n) is a non-negative number and is the adaptation coefficient and 0≦k<L. While LMS typically achieves a minimum, rate of adaptation defined by the adaptation coefficient is left unspecified. Appropriate adaptation rate control can yield a fast convergence of the echo approximator to the echo.
If the adaptation coefficient varies over time, the adaptive filter algorithm is referred to as a variable step size least mean squares (LMS) adaptive filtering algorithm. Prominent among these is the normalized LMS (NLMS) algorithm, which uses the adaptation coefficient:
                                          μ            ⁡                          (              n              )                                =                      μ                                          LP                xx                L                            ⁡                              (                n                )                                                    ,                            (        7        )            where LPxxL(n) is a short-term energy of near-end signal x(n) over a window of L samples, where L is the adaptive filter. For convenience, the short term energy is expressed in terms of the average energy over the window. The arithmetic average energy is equal to
                                          P            xx            L                    ⁡                      (            n            )                          =                              1            L                    ⁢                                    ∑                              l                =                0                                            L                -                1                                      ⁢                                                                            x                  ⁡                                      (                                          n                      -                      l                                        )                                                                              2                                                          (        8        )            and where μ is a constant between 0 and 2. The NLMS adaptive filtering algorithm is insensitive to the scaling of x(n), which makes it easier to control its adaptation rate by an appropriate choice of the adaptation coefficient. However, the NLMS adaptive filtering algorithm performs poorly when there is background noise and double talk in the received signal.
Hansler, et al. (“Signal Channel Acoustic Echo Cancellation”, Chapter 3 Adaptive Signal Processing, Springer, 2003) approximates the an optimal adaptation coefficient by
                                          μ            ⁡                          (              n              )                                =                                    E              ⁢                              {                                                      z                    ⁡                                          (                      n                      )                                                        ⁢                                      ε                    ⁡                                          (                      n                      )                                                                      }                                                                                      LP                  xx                  L                                ⁡                                  (                  n                  )                                            ⁢              E              ⁢                              {                                                                                                z                      ⁡                                              (                        n                        )                                                                                                  2                                }                                                    ,                            (        9        )            where E{x} is the expected value of x. Hansler discloses an “NLMS” adaptation coefficient without the PxxL(n) term. The PxxL(n) term is added for consistency in this disclosure. Expected values require knowledge of the statistics of the signal and are not suited for a changing environment. For example, if the echo path changes, the expected values can change. In order to use equation (9), the adaptive filter would need to be aware of the changing statistics.
Several conditions can make adaptation more difficult, including double talk, echo path change and background noise. Double talk is a condition when both parties are speaking, so there is substantial energy in both the far-end and near-end signals. A change in echo path can occur when the phone is moved into another environment. This amounts to a change in echo transfer function 204. To address background noise, typical echo cancellers estimate the background noise and adjust the adaptation rate depending on the amount of noise present (e.g., higher adaptation when the noise is low and slower adaptation when the noise is high relative to the signal level.). For double talk, typical echo cancellers estimate the double talk periods and freeze adaptation during double talk periods.
Linear adaptive echo cancellation has been applied successfully to address echoes in the electronic environment. Acoustic echoes have additional, often significant, artifacts introduced by the background, such as noise, and dynamic echo paths causing the residual error. Non-linear processing (NLP) can be used in addition to or in place of the linear adaptive filtering. In a traditional system, NLP removes or suppresses the residual echo during single talk periods and it may insert comfort noise during that period. Generally, NLP does not do anything during double talk periods. Because the residual echo can still be significant during single and double talk periods, NLP is needed to add echo suppression to the linear echo cancellation.
An NLP system can remove the residual echo while maintaining the near signal quality for the listener. One technique used in an NLP system is a central clipping approach to remove the low volume signals, including the residual echo, at or below the central clipping threshold. A disadvantage of central clipping is that the near-end signal at or below the threshold is also removed, and that residual echo higher than the threshold of the central clipping may still be present. Another approach is comfort noise insertion, which removes or attenuates the linear echo canceller output and optionally inserts comfort noise when the far-end signals are higher than the output by a predetermined threshold and/or when the output is below the near-end signals by a different predetermined threshold. The performance under this approach is good when the residual echo is small and the linear echo canceller has converged well. However, in most acoustic cases, the residual echo is not small, even with a good linear echo canceller having good double talk detectors. Another known approach, switched loss, reduces the far-end signal volume when both the near-end and far-end-signals are high. By doing so, the echo is effectively reduced, as is the possibility of howling. The primary failing of traditional NLP is that it fails to suppress residual echo while maintaining full-duplex communications.
There is a need in the industry for an improved echo cancellation and/or suppression system which performs well in the presence of background noise, during periods of double talk and during changes in echo path, while maintaining full-duplex communications.