The Enhanced Voice Services (EVS) codec is a multi-mode/multi-rate codec originally designed to offer enhanced voice services over LTE (Long Term Evolution), specifically in IMS (IP Multimedia Subsystem) with packet-based access.
The EVS codec provides a range of bit rates from average 5.9 kbps (source-controlled variable bit rate) over 7.2 kbps (constant bit rate) up to 128 kbps (constant bit rate). The purpose of provisioning this multitude of rates is to allow voice service operation tailored to specific system and service needs that may vary. For instance, a system may have strict capacity limitations in which case a low rate would be used. If there is no such capacity constraint a higher bit rate could be used. From a service quality perspective, higher bit rate operation is desirable as it can lead to better service (speech) quality. The EVS codec also offers a multitude of audio-bandwidth operation modes, i.e. an input signal can be encoded with a number of different bandwidths. This is a major enhancement as compared to Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB), which code one specific audio bandwidth only, narrowband or, respectively, wideband.
In practice the requirements on the bit rate to be used may vary during the speech session, such as a voice call. The reasons for such variations may be that available network capacity may change or that due to mobility a user equipment (UE) may roam to a cell with different transmission conditions, capacity or access technology. Rate control and codec re-negotiation is then used in order to adapt the used bit rate to the new needs. Codec re-negotiation is highly undesirable since it unavoidably interrupts the speech path.
The EVS codec comprises two basic types of operation modes, the EVS primary modes and the EVS AMR-WB interoperability (IO) modes (short ‘EVS IO’). The latter are fully bit-stream interoperable with the 3GPP (3rd Generation Partnership Project) AMR-WB codec. They were included into the EVS Codec in order to provide transcoding-free interoperability of the EVS codec to UEs that support only AMR-WB, but not EVS, or to UEs that have roamed to cells that do not support EVS, but AMR-WB. In this way Codec Re-negotiation can be avoided and Rate Control can be used instead.
The EVS primary modes have a quality advantage but can only be used when the UEs and all involved communication system components support the EVS codec.
Important to note is also that the EVS codec is able to operate with different audio bandwidth, ranging from the traditional Narrowband (NB: 300 Hz . . . 34000 Hz) to Wideband (WB: 100 Hz to 7000 Hz) to Super-Wideband (SWB: 50 Hz . . . 15000 Hz) and even Fullband (FB: 20 Hz to 20000 Hz). Many modes operating at different bandwidths share the same bit rate, see table 1.
Table 1 lists all the available EVS codec modes and their respective bit rates, net payload sizes (in bits per 20 ms frame) and bandwidths. EVS uses in all modes a speech frame size of 20 ms.
As can be seen from table 1, the bit rate ranges of the EVS primary modes and the EVS IO modes are intertwined and the bit rates of either types of modes are not contiguous. It is also to be noted that the bit rates 2.0 and 2.4 kbps are related to Voice Activity Detection (VAD) and Discontinuous Transmission (DTX) in speech pauses, i.e. speech inactivity, and are used for the transmission of SID (Silence Insertion Descriptor) frames.
The EVS Primary mode 5.9 VBR (variable bit rate) uses frames respectively encoded at 2.8, 7.2 and 8 kbps instantaneous rate in a source controlled fashion, depending on the audio input signal. The long-term average bit rate of this mode for active speech is 5.9 kbps.
TABLE 1EVS codec modes and their respective bit ratesand net bit payload sizes per 20 ms frameMode ofPayloadOperationSize (bits)Bitrate (kbps)EVS-IO40~2.00 (EVS-IO SID), every 160 msEVS Primary48~2.40 (EVS Primary SID),typically every 160 msEVS Primary562.80 constituent rate of 5.9 VBR mode(NB, WB, SWB)EVS-IO1366.60 (WB)EVS Primary1447.20 (NB, WB, SWB)also used as constituent rate of 5.9 VBR modeEVS Primary1608.00 (NB, WB, SWB)also used as constituent rate of 5.9 VBR modeEVS-IO1848.85 (WB)EVS Primary1929.60 (NB, WB, SWB)EVS-IO25612.65 (WB)EVS Primary26413.20 (NB, WB, SWB)EVS-IO28814.25 (WB)EVS-IO32015.85 (WB)EVS Primary32816.40 (NB, WB, SWB, FB)EVS-IO36818.25 (WB)EVS-IO40019.85 (WB)EVS-IO46423.05 (WB)EVS-IO48023.85 (WB)EVS Primary48824.40 (NB, WB, SWB, FB)EVS Primary64032.00 (WB, SWB, FB)EVS Primary96048.00 (WB, SWB, FB)EVS Primary128064.00 (WB, SWB, FB)EVS Primary192096.00 (SWB, FB)EVS Primary2560128.00 (SWB, FB)
While the EVS codec has originally been standardized for voice service over PS (packet switched) based channels, like in LTE (Voice over LTE, VoLTE), efforts are currently ongoing to standardize it also for 3G CS (circuit switched) systems, such as 3GPP UTRAN (Universal Terrestrial Radio Access Network). FIG. 1 shows a schematic overview of a UTRAN 100 and an associated core network (CN) 101.
As can be seen from the working principles of rate control for AMR and AMR-WB codecs, rate control signaling in 3G CS systems differ with the access technology (GERAN and UTRAN) and is also different to the rate control signaling in PS systems.
In PS based systems, the transport of speech frames is using Real-time Transport Protocol (RTP) packets. Codec Mode Request (CMR) in RTP is one means to signal Rate Control commands in the User Plane (UP). Alternatively RTCP-APP may be used to carry CMR, also in the User Plane. RTCP-APP is based on the RTP Control Protocol (RTCP) which is a sister protocol of the RTP transmissions of real-time data like speech in PS based systems.
In GERAN (GSM-EDGE Radio Access Network), CMR is transported in every speech frame, on the User Plane, from the Mobile Station to the network and vice versa. On AoIP, the modern variant of the A-interface between GERAN and CS-Core, CMRs are (ideally) transported in every RTP packet, in uplink and in downlink.
In 3G-access, i.e. UTRAN, CMR is not transported together with the speech payload. Instead the radio network controller (RNC) 107 has specific means to control the uplink rate of the UE 103: the RNC may via RRC (Radio Resource Control) signaling on the control plane prevent the UE 103 from sending certain Transport Format Combinations (TFCs), thus allowing or forbidding certain modes of the configured mode set.
During call setup, at RAB (radio access bearer) Assignment the RNC 107 initializes RAB subflow combinations for all codec modes to be used on the Iu interface, and in addition a limitation of the maximum rate for the downlink (DL) can be given. Similarly, over the radio interface a radio bearer configuration is setup which may include TFCs corresponding to all the given codec rates or only for a select subset of the requested rates. In addition the UE 103 can be ordered to not use certain defined TFCs in the uplink (UL) at the initial setup or later during the call, temporarily or for the whole call.
Similarly, during the call, the RNC 107 may send Rate Control Request (RC-Req) Messages to the CS-Core (using Iu UP protocol messages), disabling certain modes in downlink.
For instance, if the RNC 107 has received a RAB Assignment Request from the CS-Core to set up a radio bearer for AMR with mode set 4.75, 5.9, 7.4 and 12.2, then the RNC 107 may generate a corresponding initialization message to the UE 103 and the media gateway (MGW), initializing all RAB Format combinations (RFCs) and TFCs for all these rates.
The RNC 107 may select an optimal radio configuration considering possible radio or transport limitations, and forbid higher codec rates in order to admit a speech user into the system or generally to provide service for a higher amount of speech users. As an example, the RNC 107 selects not to allow the highest rate 12.2 for operation. In that case, the Iu UP message makes sure that a lower rate, e.g. 5.9 rather than 12.2, is selected as an initial rate.
At RAB Assignment the RNC 107 must accept all modes (in IuUP Version 2) as commanded by the CS-Core. Later, during the call (in fact even immediately after RAB assignment) the RNC 107 may disable certain modes with high bit rates, e.g. due to cell capacity limitations.
For the radio interface the RNC 107 will define a radio bearer configuration that either includes TFCs for all requested codec rates including SID, or a radio bearer that only includes TFCs for a proper lower subset of the requested codec rates including SID in order to optimize the resource consumption. In addition to the defined TFCs over the radio interface, the RNC 107 may also choose to disallow the use of certain TFCs to be used by the UE 103 in UL. As a result this steers the maximum allowed codec rate. As an example, the UE 103 may be initialized with a radio bearer configuration with TFCs for AMR modes 4.75, 5.9, 7.4, 12.2. Additionally AMR-SID is needed for DTX operation. If the highest rate 12.2 shall not be used, an information element (IE) is included indicating that higher rates than the allowed maximum rate 7.4 are disallowed. This disallowance means that rates above the maximum rate 7.4 are forbidden for use and hence rate 12.2 will not be used, at least until signaled otherwise. If, for instance, the RNC 107 detects a condition under which it chooses to modify the decision to allow rate 7.4 as maximum rate, either that it would allow to use rate 12.2 as maximum rate or that it would reduce the maximum rate further to e.g. 5.90 kbps, then it may send a TFCC (Transport Format Combination control) message to the UE 103, modifying the allowed maximum rate in uplink.
An important aspect is that the RNC 107 controls only the maximum rate to be used by the UE 103 on the UL. There is no possibility for the RNC 107 to control, which actual rate below or up to the maximum rate the UE 103 may use. This is rather an autonomous decision by the UE 103 that is associated with the required transmit power of each TFC versus the maximum UE transmit power. If the required transmit power for a TFC for a codec rate (up to the maximum rate) exceeds the maximum UE transmit power, then the UE 103 may autonomously select a lower codec rate associated with a TFC that does not exceed the maximum UE transmit power as described in TS 25.321 and TS 25.133.
Hence, in short, codec rate control in CS UTRAN—as for AMR and AMR-WB in general on all links—is “maximum rate control”. The UE 103 may use the maximum rate or a lower rate out of the set of configured rates. The same holds for the downlink.
In certain conditions, like at handovers, other network nodes, like especially the MGW, may also perform rate control actions. For instance, a handover may cause the mobile switching center (MSC) to change the active codec set and the associated MGW will then send Rate Control commands in both directions.