The present invention relates to audio signal processing, and, in particular, to comfort noise addition to audio signals.
Comfort noise generators are usually used in discontinuous transmission (DTX) of audio signals, in particular of audio signals containing speech. In such a mode the audio signal is first classified in active and inactive frames by a voice activity detector (VAD). Based on the VAD result, only the active speech frames are coded and transmitted at the nominal bit-rate. During long pauses, where only the background noise is present, the bit-rate is lowered or zeroed and the background noise is coded episodically and parametrically using silence insertion descriptor frames (SID frames). The average bit-rate is then significantly reduced.
The noise is generated during the inactive frames at the decoder side by a comfort noise generator (CNG). The size of an SID frame is very limited in practice. Therefore, the number of parameters describing the background noise has to be kept as small as possible. To this aim, the noise estimation is not applied directly in the output of the spectral transforms. Instead, it is applied at a lower spectral resolution by averaging the input power spectrum among groups of bands, e.g., following the Bark scale. The averaging can be achieved either by arithmetic or geometric means. Unfortunately, the limited number of parameters transmitted in the SID frames does not allow to capture the fine spectral structure of the background noise. Hence only the smooth spectral envelope of the noise can be reproduced by the CNG. When the VAD triggers a CNG frame, the discrepancy between the smooth spectrum of the reconstructed comfort noise and the spectrum of the actual background noise can become very audible at the transitions between active frames (involving regular coding and decoding of a noisy speech portion of the signal) and CNG frames.