In conventional communication systems, an encoder generates a stream of information bits representing voice or data traffic. This stream of bits is subdivided and grouped, concatenated with various control bits, and packed into a suitable format for transmission. Voice and data traffic may be transmitted in various formats according to the appropriate communication mechanism, such as, for example, frames, packets, subpackets, etc. For the sake of clarity, the term “transmission frame” will be used herein to describe the transmission format in which traffic is actually transmitted. The term “packet” will be used herein to describe the output of a speech coder. Speech coders are also referred to as voice coders, or “vocoders,” and the terms will be used interchangeably herein.
A vocoder extracts parameters relating to a model of voice information (such as human speech) generation and uses the extracted parameters to compress the voice information for transmission. Vocoders typically comprise an encoder and a decoder. A vocoder segments incoming voice information (e.g., an analog voice signal) into blocks, analyzes the incoming speech block to extract certain relevant parameters, and quantizes the parameters into binary or bit representation. The bit representation is packed into a packet, the packets are formatted into transmission frames and the transmission frames are transmitted over a communication channel to a receiver with a decoder. At the receiver, the packets are extracted from the transmission frames, and the decoder unquantizes the bit representations carried in the packets to produce a set of coding parameters. The decoder then re-synthesizes the voice segments, and subsequently, the original voice information using the unquantized parameters.
Different types of vocoders are deployed in various existing wireless and wireline communication systems, often using various compression techniques. Moreover, transmission frame formats and processing defined by one particular standard may be rather significantly different from those of other standards. For example, CDMA standards support the use of variable-rate vocoder frames in a spread spectrum environment while GSM standards support the use of fixed-rate vocoder frames and multi-rate vocoder frames. Similarly, Universal Mobile Telecommunications Systems (UMTS) standards also support fixed-rate and multi-rate vocoders, but not variable-rate vocoders. For compatibility and interoperability between these communication systems, it may be desirable to enable the support of variable-rate vocoder frames within GSM and UMTS systems, and the support of non-variable rate vocoder frames within CDMA systems. One common occurrence throughout all communications systems is the occurrence of echo. Acoustic echo and electrical echo are example types of echo.
Acoustic echo is produced by poor voice coupling between an earpiece and a microphone in handsets and/or hands-free devices. Electrical echo results from 4-to-2 wire coupling within PSTN networks. Voice-compressing vocoders process voice including echo within the handsets and in wireless networks, which results in returned echo signals with highly variable properties. The echoed signals degrade voice call quality.
In one example of acoustic echo, sound from a loudspeaker is heard by a listener at a near end, as intended. However, this same sound at the near end is also picked up by the microphone, both directly and indirectly, after being reflected. The result of this reflection is the creation of echo, which, unless eliminated, is transmitted back to the far end and heard by the talker at the far end as echo.
FIG. 1 illustrates a voice over packet network diagram including a conventional echo canceller/suppressor used to cancel echoed signals.
If the conventional echo canceller/suppressor 100 is used in a packet switched network, the conventional echo canceller must completely decode the vocoder packets associated with voice signals transmitted in both directions to obtain echo cancellation parameters because all conventional echo cancellation operations work with linear uncompressed speech. That is, the conventional echo canceller/suppressor 100 must extract packet from the transmission frames, unquantize the bit representations carried in the packets to produce a set of coding parameters, and re-synthesize the voice segments before canceling echo. The conventional echo canceller/suppressor then cancels echo using the re-synthesized voice segments.
Because transmitted voice information is encoded into parameters (e.g., in the parametric domain) before transmission and conventional echo suppressors/cancellers operate in the linear speech domain, conventional echo cancellation/suppression in a packet switched network becomes relatively difficult, complex, may add encoding and/or decoding delay and/or degrade voice quality because of, for example, the additional tandeming coding involved.