Public telephone networks often use a two-wire subscriber line or loop to connect a subscriber's telephone to the core network. The subscriber loop generally carries analog signals and is coupled to the subscriber line interface of the core network via a device referred to as a hybrid circuit. The subscriber line interface is often a four-wire interface, and as such, the hybrid circuit provides a two-wire-to-four-wire interface between the two-wire subscriber loop and the four-wire subscriber line interface. Most hybrid circuits are provided at or in association with a switching office, public branch exchange (PBX), or the like.
The hybrid circuit is a major source of echoes in public telephone networks. The echo results from an electrical mismatch between the subscriber loop and the hybrid circuit at the subscriber line interface. When a far-end user is talking, the far-end user's speech signals are delivered across the public telephone network, through the hybrid circuit serving the near-end user, and to a near-end user via the subscriber loop. Given the electrical mismatch at the hybrid circuit, the far-end user's speech signals bleed into the signals being provided from the near-end user back to the far-end user. The far-end user's speech signals that bleed into near-end user's signals cause the echo. If such echo is not cancelled, the far-end user will hear an echo of their own voice when they are talking to the near-end user.
To deal with the undesired echo, echo cancellers are employed to effectively cancel the far-end user's echo from the near-end user's signals that are being delivered to the far-end user. A primary component in an echo canceller is an adaptive filter, which primarily functions to estimate the echo, replicate the echo, and subtract the estimated echo from the near-end user's signals. Unfortunately, it is difficult to remove all of the far-end user's echo from the near-end user's signals. The remaining echo is referred to as residual echo, and is considered to be caused by the non-linear nature of the overall communication system. Accordingly, a non-linear processor (NLP) is employed to remove the residual echo when possible.
When the near-end user is not talking, the NLP is activated to remove or significantly attenuate the near-end user's signals, which include the residual echo of the far-end user. As such, the far-end user will not hear their echo as they talk. However, the effective removal of the near-end user's signals, which represent the background noise of the near-end user when the near-end user is not talking, is also removed. The resulting quietness that is perceived by the far-end user is disconcerting and is often mistaken as the connection being dropped or lost by the far-end user.
To avoid the distracting changes in perceived background noise by the far-end user, the NLP may employ a comfort noise generator to provide comfort noise when the NLP is activated to remove the residual echo of the far-end user. The comfort noise is preferably generated to have the same spectral quality and power level as the actual background noise in the near-user's signal. As such, the comfort noise provided to the far-end user when the near-end user's signals are removed by the NLP sound the same to the far-end user as the near-end user's background noise that is provided to the far-end user when the NLP is not activated.
Unfortunately, estimating and generating the comfort noise is computationally intensive and adds complexity to the echo cancellation processing. Further, the output of the echo canceller is often delivered to a downstream speech encoder that is used to efficiently encode the near-end user's signals. In general, the speech encoder attempts to identify active speech and inactive speech in the near-end user's signals. For instance, in CDMA systems, the active speech portions of the near-end user's signals are encoded at a higher rate, while the inactive speech portions, which represent periods when the near-end user is not talking, are encoded at much lower rates. In GSM/UMTS systems, the inactive speech portion is not transmitted through a Discontinuous Transmission (DTX) mechanism.
The systematic transitions between the actual near-end user signals and the comfort noise provided by the comfort noise generator during periods of silence are often difficult to efficiently handle by the speech encoder. The comfort noise may be mistaken for the onset of an active speech spurt and encoded at a higher than needed rate. Further, excessive switching between low-rate encoding and high-rate encoding often occurs, resulting in unpleasant clicking sounds being heard by the far-end user. Accordingly, there is a need for an echo cancellation technique that provides effective echo cancellation and supports efficient encoding by a downstream speech encoder.