During some telephone conversations a talker can hear a delayed copy of the talker's own voice emanating from the telephone receiver. This phenomenon is known as talker echo. Talker echo is caused by signal reflections in the telephone network and acoustics. Echo becomes increasingly annoying to the talker as the echo increases either in volume or delay relative to the talker's speech.
When an electrical wave travels down a wire, the electrical energy can be reflected back if there is a change in impedance at any point in the transmission path. In the analog portion of a telephone network this impedance mismatch occurs most significantly at the hybrid that does a 4-wire to 2-wire conversion. If the impedances are well matched very little signal is reflected. However, when there is a large impedance mismatch, most of the signal can be reflected. This reflection at the hybrid is referred to as “Hybrid Echo”.
Echo cancellers are used in the telephone network to remove (cancel) these reflections (echo) to as great a degree as possible. Much of this cancellation requires the echo canceller to compare an outgoing signal, which may contain a talker's speech signals and/or noise signals, to an incoming signal. The incoming signal may at times contain noise signals, an echo of the “outgoing” talker's speech signal and/or a speech signal from a second “incoming” talker. The echo canceller is designed to eliminate or reduce the echo by synthesizing a replica of the echo which is subtracted from the actual circuit echo.
In traditional circuit-switched networks, the echo cancellers can be placed in the 4-wire portion of a circuit. In a modem Voice over IP (VoIP) network, the echo canceller is also an important apparatus that can be either placed independently between a VoIP gateway and hybrids, or, more commonly, integrated with the VoIP gateway internally.
For an echo canceller to function properly, it needs to be able to determine if the incoming signal contains speech from a local source (near-end speech) or the echo of an outgoing signal. This determination is often referred to as “double talk” detection. A common method for double-talk detection is to compare the outgoing signal to the incoming signal. The louder the volume of the echo with respect to the volume of the outgoing signal, the more difficult this determination becomes.
The ratio of the power of the echo to the power of the outgoing signal is referred to as the echo return loss (ERL). The ERL is equal to the amount of power that is lost from the transmitted signal when the signal is echoed back. ERL is said to be “lower” when the returning echo is close to the same volume of the outgoing signal. The ERL is “higher” when the echo returns at a more reduced volume. Without echo cancellation in the telephone network, telephone calls with low ERLs have more audible echo than calls with high ERLs.
There are two broad classes of speech quality metrics: subjective and objective. Subjective measurements involve humans listening to a live or recorded conversation and assigning a rating to it. One of the most widely used and recognized subjective measure is the mean opinion score (MOS). The ITU-T Recommendation P.831 describes in detail how to conduct a subjective evaluation of network echo cancellers.
The ITU-T Recommendations P.861 and P.862 describe two objective speech quality measurement methods known as PSQM (Perceptual speech quality measurement) and PESQ (Perceptual evaluation of speech quality). They can measure and score the effects of one-way speech distortion and noise on speech quality. However, they cannot reflect other impairments related to two-way interaction such as echo.
The section 6.2 of ITU-T Recommendation G.168 describes a series of test methods in terms of evaluating echo canceller performance. These tests can evaluate the performance of some of the major echo canceller functions such as convergence depth and speed by simulating simple single and double talk conversation scenarios using Composite Source Signals (CSS) bursts.