1. Field of the Invention
The present invention relates generally to echo canceller systems in communication networks. More particularly, the present invention relates to methods and systems for masking the residual echo in echo canceller systems.
2. Background Art
Subscribers use speech quality as the benchmark for assessing the overall quality of a telephone network. A key technology to provide a high quality speech is echo cancellation. Echo canceller performance in a telephone network, either a TDM or packet telephony network, has a substantial impact on the overall voice quality. An effective removal of hybrid and acoustic echo inherent in telephone networks is a key to maintaining and improving perceived voice quality during a call.
Echoes occur in telephone networks due to impedance mismatches of network elements and acoustical coupling within telephone handsets. Hybrid echo is the primary source of echo generated from the public-switched telephone network (PSTN). As shown in FIG. 1, hybrid echo 110 is created by a hybrid, which converts a four-wire physical interface into a two-wire physical interface. The hybrid reflects electrical energy back to the speaker from the four-wire physical interface. Acoustic echo, on the other hand, is generated by analog and digital telephones, with the degree of echo related to the type and quality of such telephones. As shown in FIG. 1, acoustic echo 120 is created by a voice coupling between the earpiece and microphone in the telephones, where sound from the speaker is picked by the microphone, for example, by bouncing off the walls, windows, and the like. The result of this reflection is the creation of multi-path echo, which would be heard by the speaker unless eliminated.
As shown in FIG. 1, in modern telephone networks, echo canceller 140 is typically positioned between hybrid 130 and network 150. Generally speaking, echo cancellation process involves two steps. First, as the call is set up, echo canceller 140 employs a digital adaptive filter to adapt to the far-end signal and create a model based on the far-end signal before passing through hybrid 130. After the local-end signal, including near-end signal and/or echo signal, passes through hybrid 130, echo canceller 140 subtracts the far-end model from the local-end signal to cancel hybrid echo and generate an error signal. Although this echo cancellation process removes a substantial amount of the echo, non-linear components of the echo, which are also known as residual echo, may still remain.
To cancel non-linear components of the echo, the second step of the echo cancellation process utilizes a non-linear processor (NLP) to eliminate the remaining or residual echo. As known in the art, in stationary conditions, the residual echo is similar to a white noise signal after convergence of the adaptive filter. When the near-end talker is not active, the residual echo is eliminated by applying the NLP, such that the NLP removes the original filtered signal and replaces it with a synthetic signal that mimics the spectral characteristics of the background noise by using a Comfort Noise Generator (CNG). On the other hand, when the near-end-talker is active, conventional echo cancellers generally assume that the attenuation introduced by the adaptive filter is strong enough to bring the residual echo below the auditory capability of the far-end listener. However, such assumption is not always true and such false assumption can cause undesirable results.
In conventional echo cancellers, if we define the estimated echo as:
      e    ⁡          (      n      )        =            ∑              k        =        0                    L        -        1              ⁢                  ⁢                  Rxin        ⁡                  (                      n            -            k                    )                    ·              α        ⁡                  (          k          )                    where α(n) are the EC filter coefficients at sample n and L is the length of the filter, then the EC output signal will be:Txout(n)=Txin(n)−e(n)
With these definitions the power of the output signal will be:
  E  =                    ∑                  n          =          0                          N          -          1                    ⁢                          ⁢                        (                      Txout            ⁡                          (              n              )                                )                2              =                            ∑                      n            =            0                                N            -            1                          ⁢                                  ⁢                              (                                          Txin                ⁡                                  (                  n                  )                                            -                              e                ⁡                                  (                  n                  )                                                      )                    2                    =                        ∑                      n            =            0                                N            -            1                          ⁢                                  ⁢                              (                                          Txin                ⁡                                  (                  n                  )                                            -                                                ∑                                      k                    =                    0                                                        L                    -                    1                                                  ⁢                                                                  ⁢                                                      Rxin                    ⁡                                          (                                              n                        -                        k                                            )                                                        ·                                      α                    ⁡                                          (                      k                      )                                                                                            )                    2                    
The α(n) coefficients are defined by finding the set of coefficients that minimize the energy E by using the decent gradient algorithm on the equation:
            ∂      E              ∂              α        ⁡                  (          n          )                      =      -                  ∑                  n          =          0                          N          -          1                    ⁢                          ⁢                        Rxin          ⁡                      (                          n              -              k                        )                          ·                  (                                    Txin              ⁡                              (                n                )                                      -                          e              ⁡                              (                n                )                                              )                    
Therefore, as shown, conventional echo cancellers fail to distinguish between near-end talker activity when processing the residual, by making a false assumption that, when the near-end talker is active, the attenuation introduced by the adaptive filter is strong enough to bring the residual echo below the auditory capability of the far-end listener. Therefore, conventional echo cancellers merely rely upon the attenuation introduced by the adaptive filter to bring the residual echo below the auditory capability of the far-end listener.
Accordingly, there is a need in the art for echo canceller systems that can overcome the shortcoming of the conventional echo cancellers and process the residual echo properly even when the near-end talker is active.