As wireless communications networks become more pervasive and the number of subscribers continues to increase, wireless bandwidth becomes increasingly scarce. To mitigate this problem, advanced voice compression techniques are used to reduce the bandwidth needed by each voice call. For example, a standard 8-bits per data, 8000 samples per second voice coding, such as 64 kbits/s, may be reduced to 8 kbits/s or less via coder/decoders (codecs) such as the GSM (Global System for Mobile communication) AMR (Adaptive MultiRate) and EFR (Enhanced Full Rate) codecs and the CDMA (Code Division Multiple Access) EVRC (Enhanced Variable Rate Codec). Codecs typically operate on a collection of samples, which are compressed and sent as a frame of data. Some codecs, for example, divide a voice call into 20 ms time slices, sending a frame of data every 20 ms.
Some voice codecs define not only a speech compression algorithm but also a silence compression algorithm. It has been estimated that fifty percent or more of a typical telephone conversation is silence—i.e., the part of the conversation during which neither party is speaking. During these periods of silence, transmitting the background noise detected by the cell phone's microphone would be an unnecessary use of network bandwidth, since the silence (e.g., the background noise) has no information content. However, sending no information during periods of silence has the undesirable side-effect of causing the receiving party to wonder, due to the lack of any sound coming from the sender's phone, whether the sender has hung up or terminated the call.
Therefore, many codecs detect the background noise present at the near-end device and characterize it, such as determining its pitch and volume, and transmit the characterization parameters to the far-end device. At the far-end device, the noise parameters are used to generate a slight background noise, such as soft white noise, recreate the background noise at the near-end device and thus convey the continued presence of the other party on the line. GSM_EFR codecs send what is called a silence insertion descriptor (SID) to the far-end codec. The far-end codec generates natural background noise for the call based on parameters within the received SID frame. Example parameters within the SID frame include line spectral frequency (LSF) and energy gain. With these two pieces of information, roughly equivalent to the pitch and volume of the background noise, respectively, the receiving end is able to recreate the background sound. These SID frames are sent relatively infrequently compared to speech frames. In some codecs, the SID frames are sent at call initiation and again only when the character of the near-end background noise changes significantly.
While the clear advantage to using voice compression is that it uses less bandwidth per call, the disadvantage of using voice compression is that it introduces signal distortion. Whenever a signal is transcoded, or converted from one format to another, there is a potential for introduction of signal distortion. Transcoding refers not only to compression but also to compression/expansion (“companding”) operations, such as A-law and mu-law encoding/decoding. As shown below, in a typical mobile-to-mobile call there may be many transcoding steps, each of which having the potential to degrade the voice quality of the call.
FIG. 1 is a block diagram illustrating a conventional mobile-to-mobile call. Caller's cell phone 100 is connected via a radio frequency (RF) interface to the nearest cell phone tower and associated base station subsystem (BSS1 102). Caller's cell phone 100 typically uses a voice encoder to compress caller's voice from 64˜128 kbit/s to 12.2 kbit/s, for example, before transmitting the compressed speech data to the BSS1 102. BSS1 102 provides the interface between the RF network and the wireline network. BSS1 102 may send the speech data to a transcoded rate adaptive unit (TRAU1 104), which may decode the compressed speech data into uncompressed 8˜16 bit per sample, 8000 sample per second audio data. TRAU1 104 may transmit the uncompressed data across the network, as shown in FIG. 1, but typically it will re-encode the uncompressed voice data using a compression/expansion algorithm, such as A-law or mu-law, to boost the signal-to-noise ratio of the signal being transmitted, creating a 64 kbits/s PCM G.711 data stream. In other words, TRAU1 104 may transcode the voice data from one encoding format to another, such as from 3G_GSM_AMR to G.711. TRAU1 104 may forward the speech data across the phone network to TRAU2 106. TRAU2 106 may transcode the speech data into the compressed format used by the destination network. For example, TRAU2 106 may convert the speech data from G.711 to 2G_GSM_EFR. TRAU2 106 may send the transcoded speech data to the destination network's base station subsystem, BSS2 108. BSS2 108 may transmit the re-encoded speech data to Callee's cell phone 110.
In summary, the voice data may be encoded (and decoded) several times along the path between caller's cell phone 100 and callee's cell phone 110: encoding using the source codec by caller's cell phone 100, encoding using the intermediate codec by TRAU1 104, and encoding using the destination codec by TRAU2 106. Since both TRAU1 104 and TRAU2 106 must agree on an intermediate format, which may be 64 kbit/s mu-law PCM data, for example, TRAU1 104 and TRAU2 106 are said to be operating in tandem, and are commonly referred to as being a tandem pair.
As used herein, the term “internal format” refers to the intermediate format which the tandem pair uses to communicate data with each other, and the term “external format” refers to the format that each member of the tandem pair uses to communicate data with its respective network. The respective external formats may incompatible, as can be seen in FIG. 1, in which the external format for TRAU1 104 achieves a compression of 12.2 kbits per second, while the external format TRAU2 106 achieves a compression of 16 kbits per second.
Each encoding step—by caller's cell phone 100, BSS1 102, and BSS2 108—may introduce additional signal distortion, which degrades the overall quality of the voice call. One way to avoid the degradation of voice signal quality in a mobile-to-mobile scenario is to reduce the number of transcoding steps performed. For example, if the external format used by the caller's base station is the same or compatible with the external format used by the callee's base station, there may be no need to transcode to an intermediate format. In other words, there may be no need for a tandem pair to perform transcoding. Operation in such a mode is commonly referred to as “tandem-free operation”, or TFO. FIG. 2 illustrates an example of a network operating in TFO mode. In conventional systems, two codecs are the same or compatible if they use the same speech and silence compression algorithms and the same bit rates.
FIG. 2 is a block diagram illustrating a conventional mobile-to-mobile call using tandem-free operation. As stated above, TFO mode is possible only if the two mobile networks use the same or compatible external format. Thus, in FIG. 2 the codec used by BSS1 102 to communicate with caller's cell phone 100 is the same as, or compatible with, the codec used by BSS2 108 to communicate with callee's cell phone 110. For example, BSS1 102 and BSS2 108 may use codecs that use the same speech and silence compression algorithms and same bit rate. In this case, it is unnecessary for TRAU1 104 to transode the speech data into an internal format, such as G.711, before sending the speech data across the network to TRAU2 106, and vice versa. Instead, TRAU1 104 and TRAU2 106 may send the speech data to without transcoding, avoiding two transcoding steps. Furthermore, BSS2 108 may transmit the encoded speech data over its RF interface directly to the callee's mobile phone, thus avoiding two additional transcoding steps: the transcoding of data as it passes from the radio interface to the wired network interface in each of the respective wireless networks. In summary, by not transcoding voice data to and from an intermediate format (i.e., G.711), degradation of voice quality due to introduced signal distortion is avoided. The TFO principle may apply anywhere along the network path in which transcoding to an internal codec may be eliminated by agreement between nodes that use the same external codec.
For TFO to work, however, additional requirements must be met. One requirement is that the nodes or network entities be able to support TFO, which means that the nodes need be able to communicate with each other regarding the TFO stream. For example, the nodes may need to negotiate a TFO link, monitor link status, or provide fallback procedures in case of TFO interruption. Typically, in-band signaling is used for communication of TFO messages, since the compressed voice data stream uses a fraction of the bandwidth and thus makes bits available for a control channel. A common practice is to map the control channel onto the least significant bit or bits of the 8-bit, 64 kbit/s channel. This causes only a slight degradation of quality of uncompressed voice data, and causes no degradation of quality of the compressed voice data. Thus, the bearer channel must support in-band signaling. Another requirement is that external codecs should be the same or likewise compatible; otherwise, any benefit to skipping the intermediate transcoding step may be reduced by the need to convert from one external codec to another external codec.
An additional challenge is raised when attempting to implement TFO for a mobile-to-mobile voice call that crosses a boundary between 2G wireless networks and 3G wireless networks: there may be a difference between the 2G version of a codec and its 3G equivalent. FIG. 3 illustrates such an example.
FIG. 3 is a block diagram illustrating a conventional mobile-to-mobile call that crosses the boundary between 2G and 3G wireless networks. Although BSS1 102 and BSS2 108 both use the GSM_EFR (enhanced full rate) codec, BSS1 102 is a 3G network, and therefore uses the 3G version of the GSM_EFR codec, while BSS2 108 is a 2G mobile network, and therefore uses the 2G version of the GSM_EFR codec. Although the 2G and 3G versions of the GSM_EFR codec have the same 12.2 kbits/s rate for voice compression, their silence insertion description frames are incompatible. To address this incompatibility, conventional networks perform at least one transcoding operation, from 3G_GSM_EFR to 2G_GSM_EFR and vice versa. In practice, conventional systems, such as the one shown in FIG. 3, will perform not one but two transcoding operations, into and out of the preferred or native format used by the network backbone. In FIG. 3, for example, TRAU1 104 and TRAU2 106 may operate in tandem to transcode into and out of a common internal codec format such as the 12.2 kbit/s AMR format. Thus, although the 2G_GSM_EFR and 3G_GSM_EFR codecs are essentially identical, the incompatible silence insertion description frames prevent efficient mobile to mobile communication, which introduces distortion into the call.
FIG. 4 illustrates the general frame format for a conventional 3G EFR frame. The radio frequency channel index (RFCI) field indicates the type of frame. An RFCI value of “3” indicates that the frame is a SID frame. For a SID frame, the LSF and energy gain data will be contained in the frame payload part, which starts at octet 3.
FIG. 5 illustrates the general frame format for a conventional 2G EFR frame. A 2G EFR frame includes a payload field and four subframes. The LSF data is located in the payload field, in bits 1-38. The energy gain data is located in sub-frame 1, in bits 87-91. For a 2G EFR SID frame, sub-frames 2 through 4 do not contain any SID-specific information.
Unlike the 3G version of an EFR frame, the 2G EFR frame does not include a frame type field. Rather, a SID frame type is indicated by a particular bit pattern of sub-frame 1. The particular bit pattern is also called a frame signature. The frame signature must be analyzed in order to determine whether an incoming 2G EFR frame is a SID frame.
As can be seen from FIG. 5 and FIG. 4, the 2G GMS SID frame is 244 bits long, while the 3G UMTS SID frame is 43 bits long. This incompatibility is significant enough to prevent tandem-free operation between two otherwise compatible codecs. Referring back to FIG. 3, the incompatibility of the SID frames may force TRAU1 104 and TRAU2 106 to operate in tandem mode and use an intermediate format, such as a 12.2 kbit/s AMR. Thus, in a mobile-to-mobile voice call between a 2G network and a 3G network, tandem-free operation may not be possible, and the benefits of TFO, such as increased voice quality, may be unavailable.
Thus, there is a need for a way to enable TFO operation between 2G and 3G networks that use codecs with similar speech compression algorithms and bit rates but which have dissimilar SID frame formats. In particular, there is a need for methods, systems, and computer program products for silence insertion descriptor (SID) conversion.