In conventional PSTNs (Public Switched Telephony Network) the digitized speech is presented in a format that requires 64 kbps for transmission. In cellular networks efficient speech coding methods are used to compress the digitized speech before sending the speech over the radio access network. Decoding is used to obtain a data flow that is similar to the original digitized speech flow, for example, before transmitting the speech further to PSTN. The coding methods used in cellular networks compress the speech to a data flow that can be transmitted using less than 16 kbps in the fixed part of the cellular network. In the radio access network part of the cellular network also channel coding is needed, and the coded speech is also presented in a different format than in the fixed part of the cellular network.
If both the caller and callee use mobile stations then, in the absence of any precautions, the speech is coded and decoded twice, because it needs to be transmitted over a radio access network twice. This double coding may deteriorate the quality of the speech. It is possible to code the speech only once, if the coded speech is transmitted over the cellular networks and possible PSTN in between the cellular networks. This kind of operation is called tandem free operation (TFO).
FIG. 1 presents an example of the tandem free operation in a GSM (Global System for Mobile Communications) network. A one-way connection is presented in FIG. 1 for the sake of clarity. Usually connections are bidirectional, and the same functionality is performed in both directions. The mobile station MS1 101 communicates over a radio access network with a base station BS1 102. The digitized speech is coded in the mobile station, using codec C as presented in FIG. 1. Between a mobile station and a base station, the coded speech is presented in a format that is typical for the air interface. From the base station onwards, the coded speech is carried to a transcoder and rate adaptation unit (TRAU) in a certain format called TRAU frames. In FIG. 1 the base station BS1 transmits the coded speech to TRAU1 103. Base station controllers are not involved in the speech coding and are, therefore, not presented in FIG. 1.
The transcoder and rate adaptation unit usually decodes the speech and sends it further as a pulsed code modulation (PCM) signal that carries data with the rate of 64 kbps. The speech is sent to a Mobile Services Switching Center (MSC) which relays it either to another MSC or to the public telephony network. In FIG. 1 the TRAU1 decodes the coded speech (decodec D) and transmits the decoded speech to MSC1 104, and from there the speech is relayed via the PSTN 105 to another cellular network. The MSC2 106 relays the decoded speech further to TRAU2 107, where the speech is coded (codec C′) and inserted to TRAU frames. The base station BS2 108 converts the TRAU frames into a radio access network format, and transmits the data over the air interface to the mobile station MS2 109. In this mobile station the coded speech is decoded (decodec D′).
The different arrows 110,111 and 112 in FIG. 1 are used to present the data presentation format and signal carrying the data. Dashed arrows 110 refer to coded speech and the air interface. Solid arrows 111 refer to TRAU frames that require either an 8 kbps or a 16 kbps transmission channel and thick arrows 112 refer to decoded speech that requires a 64 kbps transmission channel and PCM signal.
If both mobile stations and TRAUs involved in a call have a common codec-decodec pair, it is possible to encode the speech only once. In the situation presented in FIG. 1, in tandem free operation the speech is coded in MS1 and decoded in MS2. TRAU1 relays the TRAU frames as TFO TRAU frames within the decoded speech (arrow 113 in FIG. 1). The TRAU1 performs also decoding and the decoded speech is transmitted to TRAU2, but it is used only if TRAU2 cannot extract the TFO TRAU frames from the data it receives. If TRAU2 notices the TFO TRAU frames, it relays the decoded speech carried by the TFO TRAU frames to BS2 in TRAU frames.
Tandem free operation requires thus special functionality, i.e. TFO capability, from the TRAUs. In practice the TFO capability means the following three things. First, the TRAUs can negotiate which codec is used. Second, they can transmit TFO TRAU frames to each other as part of the PCM signal and third, they can extract the TFO TRAU frames from the incoming PCM signal. In GSM the TFO TRAU frames are carried over the PCM so that the one or two least significant bits in each 8 bit long speech sample are replaced by TFO TRAU frame information. The TFO TRAU frame information is carried thus in a 8 kbps or 16 kbps subflow of the 64 kbps PCM flow. The destination TRAU can then ignore the rest of the PCM signal, and relay the TFO TRAU frames as TRAU frames towards the destination mobile station.
The transcoder and rate adaptation units involved in a call negotiate the speech codec using TFO inband signalling. This signalling is performed by modifying certain bits of the TRAU frame structure. The data carried in TRAU frames and TFO TRAU frames is essentially the same except for the TFO signalling bits. In the beginning of a call the TRAU units may each select the codec they use, but if both TRAUs support tandem free operation, a common codec may be negotiated. The decoded speech is usually also transmitted in the PCM signal even after a common codec has been agreed on. This is because after a handover, for example, both TRAUs involved in the call may not support the tandem free operation.
The current tandem free operation works between two cellular networks or if the cellular networks are connected via a PSTN. In the recent years, however, there has been an explosive growth in real-time data applications that use packet networks like the Internet as transport medium. These real-time applications can support voice calls and video calls. It is possible to use the Internet or other packet networks as transmission media between cellular networks, instead of PSTN. Especially with the third generation networks that are at least partly packet based, the use of packet networks between the cellular networks is a natural choice.
The H.323 specification has been created by the International Telecommunications Union (ITU) for the purpose of defining a standard framework for audio, video and data communications over networks that do not provide a guaranteed quality of service (QoS). Packet networks, for example, may be such networks. The aim of the H.323 specification is to allow multimedia products and applications from different manufacturers to interoperate.
FIG. 2 presents a situation where two GSM networks are connected with an IP (Internet Protocol) network. Each of the GSM networks 201, 202 is connected to the IP network 203 with an Voice over IP (VoIP) gateway. These VoIP gateways 204, 205 are connected to the MSCs 104, 106. From the cellular network they receive 64 kbps decoded speech as PCM signal (arrows 112 in FIG. 2), and they compress this data flow. The compressed data flow is then transferred over the IP network to another VoIP gateway (arrows 210 in FIG. 2). Usually the compressed data flow requires either 8 kbps or 16 kbps of transmission capacity. The H.323 specification, for example, defines certain codecs that can be used for compressing data in H.323 networks. It is also possible to construct proprietary codecs and gateways. Term gateway refers here neither to any specific packet network technology nor to any specific standards on telephony over packet networks. It is used as a general term for a network element connecting a cellular network and a packet network and relaying calls and other connections to and from the cellular network.
The problem in using compression in gateways when transmitting calls between cellular networks is that in the worst case the speech (or other data) is coded and decoded three times. First in the originating cellular network, then when transmitted between the cellular networks and finally in the destination cellular network. This may reduce the quality of the speech drastically.
A further problem is that even in a case, where both TRAU units involved in a call are TFO capable, it is possible that this feature cannot be utilized. This is because the TFO TRAU frames, which carry information about the speech codecs and TFO capabilities of the TRAUs and which are possible included in the PCM signal, do not necessarily stay unmodified in the compression and decompression in the gateways. Especially the TFO signaling, which is carried in certain bits of the TFO TRAU frame, is sensitive to change due to compression.