VoIP is the transport of voice traffic using the Internet Protocol (IP). In the mobile world, VoIP means using a packet-switched (PS) service for transport of Internet Protocol (IP) packets which contain, e.g., Adaptive Multi-Rate (AMR) codec speech frames for voice mobile phone calls. A packet-switched connection is often simply referred to as a data connection.
Circuit-switched networks use circuit switching for carrying voice traffic where the network resources are statically allocated from the sender to receiver before the start of the message transfer, thus creating a “circuit.” The resources remain dedicated to the circuit during the entire message transfer and the entire message follows the same path. While this arrangement works quite well to transfer voice, IP is an attractive choice for voice transport for many reasons including lower equipment costs, integration of voice and data applications including multi-media like email, instant messaging, video, the world wide web, etc., lower bandwidth requirements, and the widespread availability of IP.
In packet-switched networks, the message is broken into packets, each of which can take a different route to the destination where the packets are recompiled into the original message. The packet switched (PS) service utilized for VoIP can be, for example, GPRS (General Packet Radio Service), EDGE (Enhanced Data Rates for Global Evolution), or WCDMA (Wideband Code Division Multiple Access). Each of these example services happen to be built upon the Global System for Mobile communications (GSM), a second generation (“2G”) digital radio access technology originally developed for Europe. GSM was enhanced in 2.5G to include technologies such as GPRS. The third generation (3G) comprises mobile telephone technologies covered by the International Telecommunications Union (ITU) IMT-2000 family. The Third Generation Partnership Project (3GPP) is a group of international standards bodies, operators, and vendors working toward standardizing WCDMA-based members of the IMT-2000.
EDGE (sometimes referred to as Enhanced GPRS (EGPRS)) is a 3G technology that delivers broadband-like data speeds to mobile devices. EDGE allows consumers to connect to the Internet and send and receive data, including digital images, web pages and photographs, three times faster than possible with an ordinary GSM/GPRS network. EDGE enables GSM operators to offer higher-speed mobile-data access, serve more mobile-data customers, and free up GSM network capacity to accommodate additional voice traffic. EDGE uses the same TDMA (Time Division Multiple Access) frame structure, logical channels, and 200 kHz carrier bandwidth as GSM networks, which allows existing cell plans to remain intact.
In EDGE technology, a base transceiver station (BTS) communicates with a mobile station (e.g., a cell phone, mobile terminal or the like, including computers such as laptops with mobile termination). The base transceiver station (BTS) typically has plural transceivers (TRX). A time division multiple access (TDMA) radio communication system like GSM, GPRS, and EDGE divides the time space into time slots on a particular radio frequency. Time slots are grouped into frames, with users being assigned one or more time slots. In packet-switched TDMA, even though one user might be assigned one or more time slots, other users may use the same time slot(s). So a time slot scheduler is needed to ensure that the time slots are allocated properly and efficiently.
EDGE offers nine different Modulation and Coding Schemes (MSCs): MCS1 through MCS9. Lower coding schemes (e.g., MCS1-MCS2) deliver a more reliable but slower bit rate and are suitable for less optimal radio conditions. Higher coding schemes (e.g., MCS8-MCS9) deliver a much higher bit rate, but require better radio conditions. Link Quality Control (LQC) selects which MCS to use in each particular situation based on the current radio conditions.
In EDGE, the LQC selects a MCS for radio link control (RLC) data blocks for each temporary block flow (TBF). A TBF is a logical connection between a mobile station (MS) and a packet control unit (PCU). The PCU is usually (but not necessarily) located the radio access network, e.g., in the base station controller (BSC). A TBF is used for either uplink or downlink transfer of GPRS packet data. The actual packet transfer is made on physical data radio channels (PDCHs). The bit rate for a TBF is thus effectively selected by selecting a MCS, and changing the MCS for a TBF changes its bit rate.
Advanced Multi-rate (AMR) speech frames contain speech, typically 20 milliseconds of speech, encoded by an AMR codec. Voice encoder, vocoder, and codec are used interchangeably and refer to encoding speech/voice into a compressed digital format. An AMR codec supports unequal bit-error detection and protection (UED/UEP). The UEP/UED mechanisms allow more efficient transmission of speech over a lossy network by sorting the bits into perceptually more and less sensitive classes. A frame is only declared damaged and not delivered if there are one or more bit errors found in the most sensitive bits. On the other hand, speech quality is still deemed acceptable if the speech frame is delivered with one or more bit errors in the less sensitive bits, based on human aural perception. An important characteristic for a high bit error rate (BER) environment like EDGE is the robustness for packet loss provided by an AMR codec through redundancy and bit errors and sensitivity sorting.
Another benefit of AMR is adaptive rate adaptation for switching smoothly between codec modes on-the-fly. A large number of AMR codec modes may be used with varying bit rates and resulting voice quality. An AMR codec may include multiple narrowband codec modes: 12.2, 10.2, 7.95, 7.4, 6.7, 5.9, 5.5 and 4.75 kbit/s. Even a wideband (WB) mode AMR WB at 12.65 kbit/s is available.
Typically, for a VoIP connection, the end points of the VoIP communication, e.g., a calling mobile station A and a called mobile station B, negotiate which AMR codec mode will be used for the VoIP connection. If mobile A indicates it can use AMR codec modes 1, 2, and 3 with a default mode to AMR codec mode 2, and if B indicates it can use AMR codec modes 2, 3, and 4 with the default to AMR codec mode 2, then AMR codec mode 2 will likely be selected. The initial selection of AMR codec mode then is typically made at the application protocol layer based on a desired bit rate for the communication. As a result, the codec mode selection for VoIP calls is made at the application layer without any knowledge of current radio channel conditions or selected MCS. The determination of current radio channel conditions and the selection of MCS for the transmission of a next radio block of data are both performed at lower radio access protocol layers, i.e., at the RLC/MAC layers.
Because EDGE varies the bit rate for a TBF by selecting a MCS depending on the radio conditions at each specific radio block period, the bit rate changes very quickly. As a result, a static selection of a VoIP AMR encoder or codec mode often leads to less than optimum performance, e.g., a lower voice quality than necessary. For example, if a maximum bit rate, high voice quality encoder or codec mode is selected, it might sometimes generate data at a bit rate higher than the current over-the-air transfer rate permits, leading to VoIP packets arriving too late after the playout time has passed at the receiving end. Another problem with static selection of a VoIP AMR encoder or codec mode is that if a selected VoIP encoder or codec mode is a low bit rate, low voice quality encoder when the current radio conditions are quite good, much less data is sent in the radio block than could have been sent. In other words, the party at the receiving end could have received much better voice quality at no extra bandwidth expense, but did not because of poor resource utilization.
A related problem is inefficient hardware and bandwidth utilization. In order to reach the higher bit rates offered with EDGE, each radio block for the particular MCS encoder should be as packed full as possible. For example, an MCS-8 radio block can hold 1088 bits. If the encoder has only 500 bits to send, then less than 50% of the possible EDGE throughput is utilized, which translates into lower bit rates.
One approach to these problems might be to change the mode of the voice encoder or codec mode depending on a measured overall data throughput over the radio interface. But this approach is not well suited for “bearers”, like EDGE TBFs, that change every radio block with changed radio conditions. In other words, even if a user negotiates a particular bit rate when the TBF is established, the actual bit rate over that TBF varies depending on the quickly changing current radio conditions. Thus, by the time that the measured overall throughput is received at the network entity that can change the mode of the voice encoder, quickly changing radio conditions will have outdated that throughput value.