The Adaptive Multi-Rate (AMR) codec family is used as the mandatory codec in both GSM and WCDMA systems. AMR is available as both narrowband (AMR-NB) and wideband (AMR-WB) forms. The standards comprise a number of technical specifications of multiple components and functions, e.g. Speech codec (AMR-NB, AMR-WB), Voice Activity Detector (VAD), Discontinuous transmission system (DTX), Comfort Noise (CN), Link Adaptation (LA) etc. All these functions are defined and described in the 3GPP TS 26-series specifications. Further, a description of the AMR-NB codec is given in “The Adaptive Multi-Rate Speech Coder”, IEEE Speech Coding Workshop, Porvoo, Finland, p. 117-119, 1999, authored by Ekudden, E., Hagen, R., Johansson, I., Svedberg, J. Further still, descriptions of the VAD are given in “Voice activity detection for the GSM Adaptive Multi-Rate Codec”, IEEE Speech Coding Workshop, Porvoo, Finland, p. 55-57, 1999, authored by Vähätalo, A. and Johansson, I.
The AMR-NB and AMR-WB speech codecs have a number of operating modes which make it possible to run the codec at different bit rates, e.g. corresponding to different subjective speech quality performance. The AMR-NB codec can operate at 8 different bit rates ranging from 4.75 kbps up to 12.2 kbps as described in Table 1 below. Throughout this document “bps” stands for bits per second. During speech silence periods, as detected by the VAD, the system generates spectrally shaped Comfort Noise (CN). The CN is described with 35 bits and given continuous transmission of these parameters once per frame, the bitrate for the CN would be 1.75 kbps. In practice, the actual updating in the system is normally only once every 8th frame, so the bitrate for CN is one eighth of that value, so in fact 218.75 bps.
TABLE 1Source codec bit-rates for the AMR-NB codecCodec modeSource codec bit-rateAMR_12.2012,20 kbps AMR_10.2010,20 kbps AMR_7.957,95 kbpsAMR_7.407,40 kbpsAMR_6.706,70 kbpsAMR_5.905,90 kbpsAMR_5.155,15 kbpsAMR_4.754,75 kbpsAMR_SID1,75 kbps (218.75 bps)
Normally the system is also configured using a discontinuous transmission system (DTX) including a Voice Activity Detector (VAD) and a Comfort Noise generator (CN). These operate to detect active speech and transmit the speech signal during voice activity and also to detect speech inactivity in order to inhibit the speech signal transmission and instead to activate comfort noise generation. The proportion of voice activity is called Voice Activity Factor (VAF). The combination of the DTX, VAD and CN functions is referred to as a “DTX/VAD/CN system” from here on.
The total system capacity of a cellular communication system using a standard such as GSM and WCDMA is related to the voice activity factor (VAF). A cellular communication system generally has two transmission links, uplink (UL) and downlink (DL), from and respectively to the mobile terminal. The currently employed AMR system uses the same “DTX/VAD/CN system” in both UL and DL.
Speech transmission with DTX operation can be regarded as a simple source-controlled variable bit rate encoding method where the rate can be varied between two levels, one for active speech and the other for inactivity (and comfort noise transmission). However, the term source controlled rate variable bit rate operation (SCR VBR) typically refers to a method where the bit rate during active speech can be varied according to the needs of the source signal, e.g. in order to maintain a constant quality level. SCR VBR coding hence pursues a similar objective as speech transmission with DTX but can additionally vary the bit rate even during active speech. Examples of speech and audio codecs with SCR VBR are the 3GPP2 VMR-WB codec, 3GPP2 Enhanced Variable Rate Codec (EVRC) and MPEG Advanced Audio Codec (AAC).
Variable frame offset (VFO) coding is described in US20070147314A1. This is a method that suspends the transmission of those speech segments that the speech decoder can properly extrapolate from the received speech. The basic idea is to operate a fixed-frame length codec in such a way that a coding frame is no longer restricted to start immediately after the end of the previous coding frame. The gain provided by this method is that the effective frame rate of the codec is reduced despite the codec frame length remaining constant. Since the coding bit rate is associated with each transmitted codec frame, the average bit rate is reduced. The system thus operates as a variable rate codec, even when a constant coding bit rate is used.
Real-time packet switched transmission of speech and audio data for Voice over Internet Protocol (VoIP) applications generally makes use of the IETF Real-time Transmission Protocol (RTP) (as described in RFC 3650). This protocol provides a time-stamp field indicating the sampling instant of the first sample instant of the first sample encoded for the first frame-block in the packet. With VoIP services over wireless it remains important to reduce the bitrate over the wireless links as in circuit switched transmissions. The bit rate can be reduced by using speech transmission with DTX, another SCR VBR operation, or VFO coding as described above. A further bit rate reduction method is to reduce the overhead of the transmitted packets. One such method is header compression such as Robust Header Compression (ROHC). ROHC is described in more detail in IETF RFC3095, RFC3843, and RFC4019.
The general objective of the above described technologies is to temporally reduce the used transmission resource while maintaining the quality. The following paragraphs discuss some problems related to these techniques.
While SCR VBR coding is able to reduce the average source coding bit rate, it is not always desirable to use this feature in every communication system. In the LTE system for instance, a change of source coding bit rate involves extra signaling which in turn may cost additional transmission resources or transmission delay. See 3GPP tdoc S4-100438: On the suitability of a variable-rate coding for VoIP over LTE for more information. A further problem of SCR VBR coding is that it only reduces the net bit rate of the codec. Overhead related to packet switched transmissions like packet headers would remain unchanged and hence the relative bandwidth reduction using SCR VBR coding may be small and not worth the costs and complications associated with it, especially considering the described possible transmission system related drawbacks.
VFO coding is a solution addressing the problem of SRC VBR coding where the packet overhead does not scale with the adaptively selected bit rate. However, VFO coding suffers from other problems. For example, where the segment boundaries do not match the speech codec frame boundaries, efficiency losses may occur when VFO is used with robust header compression schemes like ROHC. These efficiency losses happen because the header compression algorithm may lose the ability to efficiently predict parts of the IP packet headers. In addition, the gain of VFO coding may be limited in cases where minor or no quality sacrifice is required. This reduces the likelihood of finding speech segments that can be properly extrapolated by the decoder from the earlier received speech, and thus reduces the likelihood of savings.
DTX is a very effective rate reduction method for periods of speech inactivity since it suspends transmission during such periods. Ideally, a DTX system would only transmit active speech while the inactive signal (background noise) that is irrelevant for the receiving end would not be transmitted at all. In practice, there is no ideal VAD algorithm that is able to distinguish the active speech parts from the inactive parts in an input speech signal with total reliability. Hence, it is an important aim to design a DTX system such that as much transmission resource as possible is saved, while still avoiding possible coding artifacts such as clipping of active speech parts, which may seriously affect the speech quality. Clipping often occurs for trailing parts of the speech (back-end clipping) or also in unvoiced parts of the speech with low-energy. One solution to the back-end clipping problem is to add a so-called hangover period for the transition period between active speech and inactivity. The hangover period is always coded as active speech, irrespective of the active speech/inactivity indication or a quality indication by the VAD. While adding a hangover period is a safe approach for the avoidance of back-end clipping it reduces the bandwidth efficiency gain that DTX can provide since, by design, a large portion of the hangover period is likely to be inactivity that does not require active speech coding to maintain signal quality.
The above described technologies and techniques, provide some scope for optimizing the use of bandwidth in a communication system. However, as evident from the above explanation, these techniques are disadvantageous or at least sub-optimal in some way and there remains a need for further techniques to improve the bandwidth efficiency of a wireless communication system. Further, any improvement must be realized while maintaining an appropriate level of quality of service.