Communication speech and audio codecs (e.g. AMR-WB, G.718) generally include a discontinuous transmission (DTX) scheme and a comfort noise generation (CNG) algorithm. The DTX/CNG operation is used to reduce the transmission rate by simulating background noise during inactive signal periods.
CNG may, for example, be implemented in several ways.
The most commonly used method, employed in codecs like AMR-WB (ITU-T G.722.2 Annex A) and G.718 (ITU-T G.718 Sec. 6.12 and 7.12), is based on an excitation+linear-prediction (LP) model. A random excitation signal is first generated, then scaled by a gain, and finally synthesized using a LP inverse filter, producing the time-domain CNG signal. The two main parameters transmitted are the excitation energy and the LP coefficients (generally using a LSF or ISF representation). This method is referred here as LP-CNG.
Another method, proposed recently and described in e.g. the patent application WO2014/096279, “Generation of a comfort noise with high spectro-temporal resolution in discontinuous transmission of audio signals”, is based on a frequency-domain (FD) representation of the background noise. Random noise is generated in a frequency-domain (e.g. FFT, MDCT, QMF), then shaped using a FD representation of the background noise, and finally converted from the frequency to the time domain, producing the time-domain CNG signal. The two main parameters transmitted are a global gain and a set of band noise levels. This method is referred here as FD-CNG.