As packet-based voice technologies have matured, service providers have started implementing packet-based voice implementations in order to reduce operational expenses. During a voice call, a party to the call may hear his own voice due to echoes at the far end of the voice call. The likelihood of such echoes increases when parties to the voice call use hands-free communications capabilities, such as speakerphones. The most common approach for detecting and suppressing such echoes is acoustic echo cancellation (AEC). While acoustic echo cancellation in networks which directly transmit speech waveform data, such as, for example, Time Division Multiplexing (TDM) networks, is well developed, it is inherently more difficult to perform echo cancellation in packet-based networks, such as, for example, Voice over Internet Protocol (VoIP) networks, which encode the waveform data with use of voice coders prior to transmission. Furthermore, the problem of acoustic echo has been exacerbated by packet networks because network packet delays can vary widely from packet to packet, as well as by the fact that typical packet propagation latency in packet networks has increased significantly as compared to, for example, TDM networks.
Network-based echo suppressors in packet-based networks have conventionally operated as follows. First, the speech waveforms are regenerated in network equipment (e.g., mobile switching center in wireless communication) by the decoding of the speech bitstream (i.e., the encoded packet data) back into waveform data, which waveform data had been originally encoded at the transmitting side of the network. After the analysis and possible enhancement (e.g., the removal of echo) of the decoded data, the waveforms are then re-encoded back into a packet bitstream by the speech coding system for transmission to the receiving side. This tandem coding process (i.e., “transcoding”) generally degrades total end-to-end speech quality, especially for low bit rate coders in modern wireless networks, and moreover, it introduces additional delay.