Discontinuous transmission is used in mobile communication systems to switch the radio transmitter off during speech pauses. The use of DTX saves power in the mobile station and increases the time required between battery recharging. It also reduces the general interference level and thus improves transmission quality.
However, during speech pauses the background noise which is transmitted with the speech also disappears if the channel is cut off completely. The result is an unnatural sounding audio signal (silence) at the receiving end of the communication.
It is known in the art, instead of completely switching the transmission off during speech pauses, to instead generate parameters that characterize the background noise, and to send these parameters over the air interface at a low rate in Silence Descriptor (SID) frames. These parameters are used at the receive side to regenerate background noise which reflects, as well as possible, the spectral and temporal content of the background noise at the transmit side. These parameters that characterize the background noise are referred to as comfort noise (CN) parameters. The comfort noise parameters typically include a subset of speech coding parameters: in particular synthesis filter coefficients and gain parameters.
It should be noted, however, that in some comfort noise evaluation schemes of some speech codecs, part of the comfort noise parameters are derived from speech coding parameters while other comfort noise parameter(s) are derived from, for example, signals that are available in the speech coder but that are not transmitted over the air interface.
It is assumed in prior-art DTX systems that the excitation can be approximated sufficiently well by spectrally flat noise (i.e., white noise). In prior art DTX systems, the comfort noise is generated in the receiver by feeding locally generated, spectrally flat noise through a speech coder synthesis filter.
Before describing the present invention, it will be instructive to review conventional circuitry and methods for generating comfort noise parameters on the transmit side, and for generating comfort noise on the receive side. In this regard reference is thus first made to FIGS. 1a-1d.
Referring to FIG. 1a, short term spectral parameters 102 are calculated from a speech signal 100 in a Linear Predictive Coding (LPC) analysis block 101. LPC is a method well known in the prior art. For simplicity, discussed herein is only the case where the synthesis filter has only a short term synthesis filter, it being realized that in most prior art systems, such as in GSM FR, HR and EFR coders, the synthesis filter is constructed as a cascade of a short term synthesis filter and a long term synthesis filter. However, for the purposes of this description a discussion of the long term synthesis filter is not necessary. Furthermore, the long term synthesis filter is typically switched off during comfort noise generation in prior art DTX systems.
The LPC analysis produces a set of short term spectral parameters 102 once for each transmission frame. The frame duration depends on the system. For example, in all GSM channels the frame size is set at 20 milliseconds.
The speech signal is fed through an inverse filter 103 to produce a residual signal 104. The inverse filter is of the form: ##EQU1##
The filter coefficients a(i), i=1, . . . , M are produced in the LPC analysis and are updated once for each frame. Interpolation as known in prior art speech coding may be applied in the inverse filter 103 to obtain a smooth change in the filter parameters between frames. The inverse filter 103 produces the residual 104 which is the optimal excitation signal, and which generates the exact speech signal 100 when fed through synthesis filter 1/A(z) 112 on the receive side (see FIG. 1b). The energy of the excitation sequence is measured and a scaling gain 106 is calculated for each transmission frame in excitation gain calculation block 105.
The excitation gain 106 and short term spectral coefficients 102 are averaged over several transmission frames to obtain a characterization of the average spectral and temporal content of the background noise. The averaging is typically carried out over four frames for the GSM FR channel to eight frames, as is the case for the GSM EFR channel. The parameters to be averaged are buffered for the duration of the averaging period in blocks 107a and 108a (see FIG. 1d). The averaging process is carried out in blocks 107 and 108, and the average parameters that characterize the background noise are thus generated. These are the average excitation gain g.sub.mean and the average short term spectral coefficients. In modern speech codecs, there are typically 10 short term spectral coefficients (M=10) which are usually represented as Line Spectral Pair (LSP) coefficients f.sub.mean (i), i=1, . . . , M, as in the GSM EFR DTX system. Although these parameters are typically quantized prior to transmission, the quantization is ignored in this description for simplicity, in that the exact type of quantization that is performed is irrelevant to the teachings of this invention.
Referring briefly to FIG. 1d, it is shown that the averaging blocks 107 and 108 each typically include the respective buffers 107a and 108a, which output buffered signals 107b and 108b, respectively, to the averaging blocks.
The computation and averaging of the comfort noise parameters is explained in detail in GSM recommendation: GSM 06.62 "Comfort noise aspects for Enhanced Full Rate (EFR) speech traffic channels". Also by example, discontinuous transmission is explained in GSM recommendation: GSM 06.81 "Discontinuous Transmission (DTX) for Enhanced Full Rate (EFR) for speech traffic channels", and voice activity detection (VAD) is explained in GSM recommendation: GSM 06.82 "Voice Activity Detection (VAD) for Enhanced Full rate (EFR) speech channels". As such, the details of these various functions are not further discussed here.
Referring to FIG. 1b, there is shown a block diagram of a conventional decoder on the receive side that is used to generate comfort noise in the prior art speech communication system. The decoder receives the two comfort noise parameters, the average excitation gain g.sub.mean and the set of average short term spectral coefficients f.sub.mean (i), i=1, . . . , M, and based on the parameters the decoder generates the comfort noise. The comfort noise generation operation on the receive side is similar to speech decoding, except that the parameters are used at a significantly lower rate (e.g., once every 480 milliseconds, as in the GSM FR and EFR channels), and no excitation signal is received from the speech encoder. During speech decoding the excitation on the receive side is obtained from a codebook that contains a plurality of possible excitation sequences, and an index for the particular excitation vector in the codebook is transmitted along with the other speech coding parameters. For a detailed description of speech decoding and the use of codebooks reference can be had to, by example, U.S. Pat. No.: 5,327,519, entitled "Pulse Pattern Excited Linear Prediction Voice Coder", by Jari Hagqvist, Kari Jarvinen, Kari-Pekka Estola, and Jukka Ranta, the disclosure of which is incorporated by reference herein in its entirety.
During comfort noise generation, however, no index to the codebook is transmitted, and the excitation is obtained instead from a random number or excitation (RE) generator 110. The RE generator 110 generates excitation vectors 114 having a flat spectrum. The excitation vectors 114 are then scaled by the average excitation gain g.sub.mean in scaling unit 115 so that their energy corresponds to the average gain of the excitation 104 on the transmit side. A resulting scaled random excitation sequence 111 is then input to the speech synthesis filter 112 to generate the comfort noise 113. The average short term spectral coefficients f.sub.mean (i) are used in the speech synthesis filter 112.
FIG. 1c illustrates the spectrum associated with the signal in different parts of the prior art decoder of FIG. 1b. The RE-generator 110 produces the random number excitation sequences 114 (and the scaled excitation 111) having a flat spectrum. This spectrum is shown by curve A. The speech synthesis filter 112 then modifies the excitation to produce a non-flat spectrum as shown in curve B.
During a hangover period, or time between when a voice activity detector (VAD) indicates that speech has stopped and when the transmission is actually terminated, the speech coding parameters characterizing background noise are stored and averaged for constructing CN parameters. Reference in this regard can be had to FIGS. 3 and 4, which are exemplary of the GSM system. Since the VAD has detected speech inactivity, it is guaranteed that the speech frames contain only noise (and not speech), and thus these hangover frames can be used for the averaging of speech encoder parameters to evaluate the comfort noise parameters.
The length of the hangover period is determined by the length of the SID averaging period, i.e., the length of the hangover period must be long enough to complete the averaging of the parameters before the resulting comfort noise parameters are to be transmitted in a SID frame. In the DTX system of the GSM full rate speech coder, the length of the hangover period equals four frames (the length of the SID averaging period), since the comfort noise evaluation technique uses only parameters from the previous frames to make an updated SID frame available. In the DTX system of the GSM enhanced full rate speech coder, the length of the hangover period equals seven frames (the length of the SID averaging period minus one), since the parameters of the eighth frame of the SID averaging period can be obtained from the speech encoder while processing the first SID frame. FIG. 3 illustrates the concepts of the hangover period and the SID averaging periods in the DTX system of the GSM enhanced full rate speech coder, and FIG. 4 shows as an example the longest possible speech burst without hangover.
At the end of the hangover period the first SID frame is transmitted, and the comfort noise evaluation algorithm continues evaluating the characteristics of the background noise and passes the updated SID frames to the transmitter frame by frame, as long as the VAD continues to detect speech inactivity.
It can be appreciated that, if the transmission of comfort noise parameters is not regular in nature, the resulting generated comfort noise may not match the original background noise at the transmitter.
It can be further appreciated that if the comfort noise parameters are transmitted as separate, discrete messages, that a certain amount of system bandwidth is consumed. By example, if in the IS-136 system the CN parameters were sent in a dedicated Fast Associated Control Channel (FACCH) message, then two time slots would be required because of the two burst interleaving that is employed for FACCH messages.
In the IS-136 system the FACCH is defined to be a blank and burst channel used for signalling exchange between the base station and the mobile station. A Slow Associated Control Channel (SACCH) is defined to be a continuous channel used for message exchange between the base station and the mobile station. A fixed number of bits are allocated to the SACCH in each TDMA slot.
In the prior art GSM system the comfort noise parameters are sent in-band (i.e., coded into voice coder slots). While this technique may be applicable to other digital cellular standards, it would not be compatible with a presently specified IS-136 Enhanced Full Rate (EFR) voice coder. It has also been found that the approximately 0.5 second CN update that is performed in GSM may be relaxed, thereby utilizing less system bandwidth for CN updates.