This invention relates to a system and method for generating comfort noise.
Various techniques are used in packet-based speech communication systems to maintain a high quality of conversation. In particular, significant effort is made to eliminate or reduce echoes from the speech data transmitted between communication devices. In telephony, audio signals (e.g. including voice signals) are transmitted between a near-end and a far-end. Far-end signals which are received at the near-end may be outputted from a loudspeaker. A microphone at the near-end may be used to capture a near-end signal to be transmitted to the far-end. An “echo” occurs when at least some of the far-end signal outputted at the near-end is included in the near-end signal which is transmitted back to the far-end. In this sense the echo may be considered to be a reflection of the far-end signal. An example scenario is illustrated in FIG. 1a, which shows a signal being captured by a far-end microphone and output by a near-end loudspeaker. The echo is a consequence of acoustic coupling between the loudspeaker and a microphone at the near-end; the near-end microphone captures the signal originating from its own loudspeaker in addition to the voice of the near-end speaker and any near-end background noise. The result is an echo at the far-end loudspeaker.
Echo cancellers are typically employed to eliminate or reduce echoes by synthesizing an estimate of the echo from the far-end voice signal. The estimated echo is then subtracted from the microphone signal. Adaptive signal processing is generally used to generate a signal accurate enough to cancel the echo effectively. Even with high performance adaptive filters it is not always possible for an echo canceller to remove all echoes from a signal, and the echo cancelled signal from an echo canceller will often include a remnant echo of the far-end voice signal. This is because the echo estimate will not always precisely match the true echo in the microphone signal. There can be several reasons for this, including loss of convergence of the adaptive filter due to changes in echo path and as a result of freezing the adaptive filter during near-end speech to avoid wide divergence of the filter.
An echo suppressor can be used to remove the remnant echo when there is no near-end speech by replacing or masking the microphone signal when remnant echo is present. For example, the echo suppressor may replace the remnant echo in the microphone signal with synthetic ambient background noise (or comfort noise) generated at the communication device. This eliminates the remnant echo but provides some low-level noise to the far-end listener, avoiding complete silence which can make a communication channel sound dead.
ITU standard G.711 Appendix II describes a commonly-used technique for generating comfort noise in which linear prediction coding (LPC) is used to generate a noise signal based on a random noise excitation, as shown in FIGS. 1b and 1c. Comfort Noise Generation (CNG) typically comprises two stages: (i) an analysis stage as shown in FIG. 1b in which characteristics of background noise in an input signal are determined (e.g. during noise only periods in which the microphone signal includes neither near-end speech nor far-end speech), and (ii) a synthesis stage as shown in FIG. 1c in which comfort noise is synthesised (e.g. during echo only periods in which far-end speech is present in the microphone signal but near-end speech is not present in the microphone signal). In FIG. 1b, which illustrates the steps performed in a conventional LPC analysis, estimation of LPC coefficients 103 is performed on an input frame 101 when no speech is detected 102 in the microphone signal (e.g. at a Voice Activity Detector, or VAD). In order to estimate the energy in the input signal, the input signal is provided as an excitation signal to an inverse LPC filter 104, which is configured to estimate the energy present in the signal 113 using the LPC coefficients. The LPC coefficients are also converted into reflection coefficients (RC) 105. The RC coefficients can be averaged using a low pass filter 106 so as to yield RC parameters 107 expressing average characteristics of the background noise present in the input signal. The use of RC parameters are less prone to transmission errors and allow a representation of the background noise to be reliably transmitted for use at a remote communication device.
When an echo suppressor at a conventional communication device requires comfort noise to replace the input signal during periods which do not contain near-end speech (e.g. during echo only periods in which far-end speech is present in the microphone signal but near-end speech is not present in the microphone signal), comfort noise synthesis can be invoked as shown in FIG. 1c. A random noise generator 108 provides an excitation signal for an LPC filter 111 configured using LPC coefficients converted at step 109 from the RC parameters 107 derived in the analysis shown in FIG. 1b. The gain 110 of the random noise excitation signal can be controlled using the first reflection coefficient and is used to shape the LPC filter response. The output of the LPC filter provides comfort noise 112.
The conventional approach to generating comfort noise shown in FIGS. 1b and 1c suffers from various problems. The gain adjustment technique can lead to quality issues e.g. if the VAD misdetects periods of no speech, and/or if there are sudden changes in noise level in the microphone signal. In such scenarios, the output power of the synthesised comfort noise can significantly differ from the actual power of the background noise in the input signal. Furthermore, the noise source 108 typically provides white noise which often will not reflect the true ambient noise characteristics in the input signal.