Today, multi-mode coding systems employing at least two different source and channel codec modes can be used to maintain near-to-optimum communication quality under varying transmission channel conditions. A mode with low source coding bit rate and a high degree of channel error protection can be chosen for bad channels. On the other hand, good channels allow selection of a codec mode with high source coding bit rate and a relatively low degree of error protection.
As is well known in the art, such multi-mode coding systems must convey (either explicitly or implicitly) the actually chosen codec mode to a receiving decoder to enable proper decoding of received data. Two-way communication systems with codec mode adaptation have additionally to transmit similar information over the return link. This is either quantized link measurement data describing the present forward channel state, or a corresponding codec mode request/command taking the channel state account. Such link adaptation data is known in the art as codec mode information, consisting of codec mode indications (the actually selected codec mode) and codec mode requests/commands (the codec mode to be used on the transmitting side). The evolving Global System for Mobile Communication (GSM) Adaptive Multi-Rate (AMR) standard employs the above described codec mode adaptation.
In such AMR systems, in-band signaling is used to reallocate parts of the speech transmission resource for transmitting control information. It is applied where no other suitable control channels are available. The GSM AMR speech coding standard is an example which makes use of in-band signaling. It uses parts of the GSM speech traffic channel for the transmission of AMR link adaptation data. More specifically, the GSM AMR standard provides an in-band channel for the transmission of codec mode information.
Codec mode information consists of codec mode requests/commands and codec mode indications, which are transmitted every second frame (every 40 ms), in alternating order. Codec mode information identifies a codec mode in a subset of up to 4 codec modes out of 8 (for adaptive full-rate speech, or AFS) or 6 (for adaptive half-rate speech, AHS) available modes. These codec mode subsets are referred to as active codec sets.
In any communication system, including the above described GSM AMR system, transmission capacity is a limited and costly resource. For this reason, in order to save transmission capacity, Discontinuous Transmission (DTX) is widely applied when transmitting speech. Sometimes DTX is referred to as Voice Operated Transmission (VOX). The basic principle of DTX is to turn off transmission during speech inactivity. Instead, so-called comfort noise (CN) parameters are transmitted which enable the decoder to reproduce the inactivity signal, which usually is some kind of background noise. CN parameters require much less transmission resource than speech. DTX is also an important feature for mobile telephones as it allows turning off power consuming devices (such as radio transmitters) during inactivity. Doing so helps to save battery power and to increase the talk time of the phones.
In two-way communication systems employing DTX, there will typically be one link active while the other link is inactive (as one speaker is talking while the other is listening). The active link has, with some reduced frame transmission rate, to convey silence descriptor (SID) frames (also known as background information, or comfort noise, descriptor frames) to the receiver. SID frames contain CN parameters and enable a receiver to generate a comfort noise silence signal, for example to reassure a listening user that the connection is still active.
In the present GSM speech coding standards FR, HR and EFR, DTX is realised in a very similar way. By way of example, the state of the art of DTX operated speech communication in the GSM system will be described with respect to the GSM EFR codec. For additional information, see for example the GSM 06.11, GSM 06.12, GSM 06.21, GSM 06.22, GSM 06.31, GSM 06.41, GSM 06.61, GSM 06.62, and GSM 06.81 standards, and related documents. The GSM EFR scheme is characterised as follows:
End of speech activity is signaled by the transmission of a first SID frame, which is not phase-aligned to the SACCH. Rather, it is immediately following the last active speech frame. After such a first SID frame, update SID frames are transmitted with a period of once per 24 frames (=480 ms). Update SID frame transmission is aligned with the time alignment flag (TAF), which is generated in the radio subsystems and which is derived from the SACCH frame structure. Apart from SID frames, no other frames are transmitted during inactivity. Simply resuming the transmission of active speech frames ends the inactivity period.
The RSS handles SID frames as regular speech frames. This means in particular that the same channel coding and diagonal interleaving is used as for speech frames. A number of effectively fourty-three (43) net bits is used for the comfort noise parameters which describe spectral shape and gain of the inactivity signal. Ninety-five (95) net bits are used for a special SID bit pattern to identify the frame as a SID frame and to make it distinct from speech frames. CN parameters are differentially encoded with respect to parameters, which are derived from the last transmitted speech frames.
The described SID frame transmission is illustrated in FIG. 1 for TCH/FS (i.e., traffic channel/full-rate speech) and in FIG. 2 for TCH/HS (i.e., traffic channel/half-rate speech). The upper row symbolises the speech frames, as they are seen at the input of the speech encoder. The middle row symbolises the TDMA frames that transmit the respective speech or SID bits via the radio interface. The lower row symbolises the speech or comfort noise frames after the speech decoder. Every speech frame is exactly 20 ms long. The TDMA frames have in average a distance of exactly 5 ms. TDMA frames for SACCH and IDLE are not shown. Implementation delays and other side effects are not shown either.
Apart from regular transmission of SID frames, synchronously and time aligned to a fixed time structure, ITU-T recommendation G.729/Annex B describes a DTX method which transmits SID frames whenever an update of the CN parameters is required because they have changed significantly since the last SID frame transmission.
In the well known Pacific Digital Cellular (PDC) system with VOX functionality, special post- and pre-amble frames are used to signal transitions from speech to inactivity or, respectively, back from inactivity to speech (see, for example, RCR STD-27D). These frames contain unique bit patterns on gross bit level to identify them. Post-amble frames consist of two channel frames of which the first carries no other information than the identification bit pattern and of which the second carries comfort noise parameters describing the inactivity signal. During voice inactivity, post-amble frames are sent periodically to enable the receiving end to update the comfort noise generation. For both post- and preamble frames, the same interleaving is used as for speech frames.
The above described conventional DTX solutions, as realized in GSM FR, EFR, and HR, are not well suited for use in multi-mode coding systems. This results from the fact that SID frame signaling is done on net bit level. A special bit pattern identifying the SID frame is part of the net bit stream. The SID frame detection unit at the receiver is executed after de-interleaving and channel decoding. This approach is inappropriate for multi-mode coding systems with more than one source and channel mode since the SID frame identification would depend on the correct choice of the codec mode for channel decoding. The correct codec mode at the receiver can, due to possible mode transmission errors, not always be guaranteed.
Moreover, for analogue reasons, variations of the interleaving scheme, either for the different codec modes or for SID frames, are also impractical, for complexity reasons. Such approaches require in the worst case to run SID frame de-interleaving and, more severe, channel decoding in addition to speech frame de-interleaving and channel decoding.
Additionally, there are at least two major problems in adopting the PDC realization. Firstly, as post-amble frames consist of two traffic frames, the inactivity transmission mode is relatively inefficient in terms of transmission power savings. Each comfort noise parameter update requires the transmission of two frames. Secondly, as transitions from speech inactivity to activity are signaled by pre-amble frames, either parts of the speech onsets may be clipped or transmission of speech onsets is resumed delayed by the pre-amble frame. The former effect directly degrades the quality of the reconstructed speech while the latter increases the speech transmission delay which may cause degradations of the conversational quality.
Note also that applying a common diagonal interleaving scheme over two frames for SID and speech frames, as is presently done in both GSM and PDC, causes further problems. Applying diagonal interleaving for transmission of single SID frames is inefficient in terms of radio resource usage and power consumption since only one half of every transmitted TDMA frame carries SID information while the other half remains unused and is thus wasted (such wasted half bursts are marked in FIGS. 1 and 2).
This efficiency loss in current GSM and PDC systems is small as SID frame transmission is relatively seldom. However, it is more severe for new multi-mode communication systems with codec mode adaptation. High adaptation performance requires much more frequent information transmission (adaptation data) over the inactive link compared to the transmission of SID frames in current systems.
Moreover, there are certain upper limits of the radio channel activity during inactivity (e.g., the AMR system requirement is: TCH/AFS: 16 TDMA frames per 480 ms multiframe; TCH/AHS: 12 TDMA frames per 480 ms multiframe). Wasting half of the available radio resource would mean that codec mode information could only be transmitted half frequently than principally possible. The result is a potential performance loss due to slower codec mode adaptation.
A further disadvantage of applying the same diagonal interleaving for SID frames (carrying codec mode information) as for speech frames is the delay caused by this kind of interleaving. With respect to achieving the best possible performance of codec mode adaptation of the multi mode communication system, transmission delay of codec mode information should be kept at a minimum. This prohibits the usage of diagonal interleaving.
A particular problem in systems with DTX is the detection of speech onsets after periods of inactivity. Missing the onset results in clipped speech output of the decoder. On the other hand, if a non-transmitted frame is erroneously detected as a speech onset frame, undesirable plop or bang sounds can be produced which can degrade communication quality considerably.
In principle, AMR systems with DTX operation merely need to transmit codec mode requests for the currently active link over the inactive link. No codec mode indications for the inactive link need be transmitted. However, when the inactive link becomes active again, a suitable codec mode must be selected. A solution of how to select the codec mode for speech onsets after inactivity has to be found which ensures that transmitting and receiving side apply the same mode. Moreover, this codec mode should be suitable with respect to the current radio channel conditions.
Apart from the codec mode signaling method in the AMR standard, so far no further fast control channels are available. However, there is a need for such a channel in order to be able to perform fast configuration changes (e.g., to change an active codec set, to change the phase of codec mode information in order to minimize transmission delay, to handover to an existing GSM codec such as FR, EFR, or HR, and/or to switch to a future application such as a wideband codec, speech and data, or multi-media).
Accordingly, there is a need for improved methods and apparatus for performing DTX and configuration changes in adaptive multi-rate systems.