Speech compression represents a basic operation of many telecommunications networks, including wireless and voice-over-Internet Protocol (VOIP) networks. This compression is typically based on a source model, such as Code Excited Linear Prediction (CELP). Speech is compressed at a transmitter based on the source model and then encoded to minimize valuable channel bandwidth that is required for transmission. In many newer generation networks, such as Third Generation (3G) wireless networks, the speech remains in a Coded Domain (CD) (i.e., compressed) even in a core network and is decompressed and converted back to a Linear Domain (LD) at a receiver. This compressed data transmission through a core network is in contrast with cases where the core network has to decompress the speech in order to perform its switching and transmission. This intermediate decompression introduces speech quality degradation. Therefore, new generation networks try to avoid decompression in the core network if both sides of the call are capable of compressing/decompressing the speech.
In many networks, especially wireless networks, a network operator (i.e., service provider) is motivated to offer a differentiating service that not only attracts customers, but also keeps existing ones. A major differentiating feature is voice quality. So, network operators are motivated to deploy in their network Voice Quality Enhancement (VQE). VQE includes: acoustic echo suppression, noise reduction, adaptive level control, and adaptive gain control.
Echo cancellation, for example, represents an important network VQE function. While wireless networks do not suffer from electronic (or hybrid) echoes, they do suffer from acoustic echoes due to an acoustic coupling between the ear-piece and microphone on an end user terminal. Therefore, acoustic echo suppression is useful in the network.
In the older generation networks, where the core network decompresses a signal into the linear domain followed by conversion into a Pulse Code Modulation (PCM) format, such as A-law or μ-law, in order to perform switching and transmission, network-based VQE has access to the decompressed signals and can readily operate in the linear domain. (Note that A-law and μ-law are also forms of compression (i.e., encoding), but they fall into a category of waveform encoders. Relevant to VQE in a coded domain is source-model encoding, which is a basis of most low bit rate, speech coding.) However, when voice quality enhancement is performed in the network where the signals are compressed, there are basically two choices: a) decompress (i.e., decode) the signal, perform voice quality enhancement in the linear domain, and re-compress (i.e., re-encode) an output of the voice quality enhancement, or b) operate directly on the bit stream representing the compressed signal and modify it directly to effectively perform voice quality enhancement. The advantages of choice (b) over choice (a) are three fold:
First, the signal does not have to go through an intermediate decode/re-encode, which can degrade overall speech quality. Second, since computational resources required for encoding are relatively high, avoiding another encoding step significantly reduces the computational resources needed. Third, since encoding adds significant delays, the overall delay of the system can be minimized by avoiding an additional encoding step.
Performing VQE functions or combinations thereof in the compressed (or coded) domain, however, represents a more challenging task than VQE in the decompressed (or linear) domain.