To meet the increasing demand for mobile communication services, many modern mobile communication systems increase their capacity by exploiting the fact that during conversation the channel is carrying voice information only 40% to 60% of the time. The rest of the time the channel is only utilized to transmit silence or background noise. In many cases the voice activity in the channel is even lower than 40%. Conventional mobile communication systems, such as discontinuous transmission (DTX), have provided some increase in channel capacity by sending a reduced amount of information during the time there is no voice activity.
Referring to FIG. 1, a timing diagram shows a typical analog speech signal 105 and a corresponding data frame signal 110 for a conventional DTX system. In DTX systems, a transmitting end typically detects the presence of voice using voice activity detectors (VAD). Based on the VAD output, the transmitting end sends active voice frames 115 when there is voice activity. When no voice activity is detected, the transmitting end intermittently sends Silence Identification [Silence Descriptor] (SID) frames 120 to the receiving end and stops transmitting active voice frames until voice is again detected or an update SID is required. The decoding (Receiving) end uses the SID frames 120 to generate “comfort” noise. While no SID frames are received, the decoder continues to generate comfort noise based on the last SID frames it had received. An example of a conventional DTX system is described in 3GPP TS 26.092 V6.0.0 (2004-12) Technical Specification issued by 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Mandatory speech codec speech processing functions, Adaptive Multi-Rate (AMR) speech codec Comfort noise aspects(Release 6).
Referring to FIG. 2, a timing diagram shows a typical analog speech signal 205 and a corresponding data frame signal 210 for a conventional CTX system. In CTX systems a variable rate vocoder may be employed to exploit the voice activity in the channel. In these systems the bit rate required for maintaining the communication link is reduced during periods of no voice activity. The VAD is part of a rate determination sub-system that varies the transmitted bit rate according to the voice activity and type of speech frame being transmitted. An example of such a technique is the enhanced variable rate codec (EVRC) used in CDMA systems. The EVRC selects between three possible bit-rates (full, half, and eight rate frames). During no speech activity only eighth rate frames are transmitted, thus reducing the bandwidth utilized by the channel in the system. This technique helps increase the capacity of the overall system. An example of a conventional CTX system is described in 3GPP2 C.S0014-A V1.0 April 2004, issued by Enhnaced Variable Rate Codec, Speech Service Option 3 for Wideband Spread Spectrum Digital Systems.
In packet-based communication systems, bandwidth reduction schemes such as those used in DTX or CTX systems with variable-rate codecs may not provide a significant capacity increase. In DTX networks a SID frame, for example, may use up bandwidth that is equivalent to that of a normal speech frame. For CTX systems, the advantage of using variable-rate codecs may not provide a significant bandwidth reduction on packed-based networks. This is due to the fact that the reduced bit-rate frames may utilize similar bandwidth in the packet-based network as a voice-active frame. For example, when an EVRC is used, an eighth rate packet may utilize similar bandwidth as a full rate or half rate packet due to overhead information added to each packet, thus eliminating the capacity increase provided by the variable-rate codec that is obtained on other types of communication channels.
One approach to reducing bandwidth utilization in packet-based networks using the EVRC is to eliminate the transmission of all eighth rate packets. Then, on the decoding side, the missing packets may be treated as frame erasures (FER). However, the FER handling of the EVRC was not designed to handle a long string of erased frames, and thus this technique produces poor quality output when synthesizing the signal presented to the user. Also, since the decoder does not receive any information on the background noise represented by the dropped eighth rate frames, it cannot generate a signal that resembles the original background noise signal at the transmit side.
Thus there is a need to improve the above method to achieve higher quality while reducing network bandwidth utilization.
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of embodiments of the present invention.