The present invention relates to audio signal processing, and, in particular, to noisy speech coding and comfort noise addition to audio signals.
Comfort noise generators are usually used in discontinuous transmission (DTX) of audio signals, in particular of audio signals containing speech. In such a mode the audio signal is first classified in active and inactive frames by a voice activity detector (VAD). An example of a VAD can be found in [1]. Based on the VAD result, only the active speech frames are coded and transmitted at the nominal bit-rate. During long pauses, where only the background noise is present, the bit-rate is lowered or zeroed and the background noise is coded episodically and parametrically. The average bit-rate is then significantly reduced. The noise is generated during the inactive frames at the decoder side by a comfort noise generator (CNG). For example the speech coders AMR-WB [2] and ITU G.718 [1] have the possibility to be run both in DTX mode.
The coding of speech and especially of noisy speech at low bit-rates is prone to artefacts. Speech coders are usually based on a speech production model which doesn't hold anymore in presence of background noise. In that case, the coding efficiently drops and the quality of decoded audio signal decreases. Moreover certain characteristics of speech coding may be especially perturbing when handling noisy speech. Indeed at low rates, the coarse quantization of coding parameters produces some fluctuation over time, fluctuations perceptually annoying when coding speech over stationary background noise.
Noise reduction is a well-known technique for enhancing the intelligibility of speech and improving the communication in the presence of background noise. It was also adopted in speech coding. For example the coder G.718 uses noise reduction for deducing some coding parameters like the speech pitch. It has also the possibility to code the enhanced signal instead of the original signal. The speech is then more predominant compared to the noise level in the decoded signal. However, it usually sounds more degraded or less natural, as noise reduction might distort the speech components and cause audible musical noise artifacts in addition to the coding artifacts.