Digital networks like packet based IP (Internet Protocol) networks or TDM (Time Division Multiplex) based networks are employed to transmit not only signals traffic but also digitised analogue signals, in particular audio signals like speech and video.
Before an digitised analogue signal can be transmitted by the digital network, an analogue-to-digital conversion of the signal has to be carried out. Further, the signal is usually compressed, e.g. with a ratio of 8:1 or 4:1, to allow a low bit rate access to the core network and for capacity savings within the core network itself.
When transferring voice between two IP terminals, for example, the speech is converted and compressed by an encoder in the source terminal to form parameterised coded digitised analogue signals and decompressed and reconverted by a decoder in the destination terminal and vice versa.
The quality of the speech presented to an enduser at the respective source terminal depends on a variety of factors.
A first group of factors is network related and comprises delay, lost packets etc. on the transmission route.
A second group of factors is terminal related and comprises the quality of the microphone, the loudspeakers, the A/D converter, the automatic level control, the echo canceller, the noise suppressor etc. A further terminal related factor is the surroundings of the terminal, like environmental noise. Beside the different quality of employed speech enhancement features or services, some of the terminals might even lack completely certain speech enhancement features or services which would be useful to increase the satisfaction of the enduser.
A third group of factors appears when several networks are involved in one transmission, e.g. when an IP terminal inter-works with the PSTN (Public Switched Telephone Network) or a mobile access network. In such a case, additional degradations may result from echo from PSTN hybrids or from acoustic noise from mobile terminals etc. IP-PSTN gateways are utilised to enable the inter-working between the IP network and the PSTN or the mobile access network. These gateways may include features for enhancing the quality of the speech they transmit.
However, some gateways are lacking important speech enhancement features.
In digital networks, usually nothing is done to compensate for the terminal or the network transition specific factors on the network side.
For GSM (Global System for Mobile communication) networks, the ETSI (European Telecommunication Standards Institution) TFO (Tandem Free Operation) specifies how multiple encoding and decoding, especially at gateways and switches, can be avoided. When complying with the TFO model, a transmitted TFO stream includes parameterised coded speech that goes end-to-end in the speech parameter domain. The end-points may be two mobiles or a mobile and an IP-terminal via a gateway. Two IP terminals interconnected only by an IP network involve a TFO by nature. The same principles are valid for the GPRS (General Packet Radio Service) and the third generation networks where the speech may stay all the way in the packet based network. Exemplary routes of the latter are: MS-BS-RNC-SGSN-GGSN-IP terminal or MS-BS-PCU-SGSN-GGSN-IP terminal (MS: Mobile Station; BS: Base Station; RNC: Radio Network Controller; SGSN: Serving GPRS Support Node; GGSN: Gateway GPRS Support Node; PCU: Packet Control Unit). However, until end-to-end TFO connections are realised in all networks, the transition factors influencing the quality of transmitted digitised analogue signals still have to be considered and the terminal specific factors are not affected by the TFO approach anyhow.
In the whole, it would be beneficial if digital networks provided means for enhancing the quality of digitised analogue signals. Multiple encoding and decoding, however, should be avoided for quality reasons.
For packet based networks, ITU-T specification H.323 (07/2000) introduces a multipoint processor (MP) used for conference calls. The multipoint processor prepares N-audio outputs from M-audio inputs by switching and/or mixing. For mixing, the input audio signals are decoded to linear signals on which a linear combination is performed. The resulting signal is encoded again to the appropriate audio format. It is proposed that the multipoint processor moreover eliminates or attenuates some of the input signals in order to reduce noise and other unwanted signals.
This means, however, that an additional decoding and encoding step is introduced as well, which should be avoided for the sake of the quality of the audio signal as mentioned above and of a small processing delay.