1. Field of the Invention
The present invention relates generally to speech coding and, more particularly, to noise suppression
2. Related Art
Generally, a speech signal can be band-limited to about 10 kHz without affecting its perception. However, in telecommunications, the speech signal bandwidth is usually limited much more severely. For instance, the telephone network limits the bandwidth of the speech signal to a band of between 300 Hz to 3400 Hz, which is known in the art as the “narrowband”. Such band-limitation results in the characteristic sound of telephone speech. Both the lower limit of 300 Hz and the upper limit of 3400 Hz affect the speech quality.
In most digital speech coders, the speech signal is sampled at 8 kHz, resulting in a maximum signal bandwidth of 4 kHz. In practice, however, the signal is usually band-limited to about 3600 Hz at the high-end. At the low-end, the cut-off frequency is usually between 50 Hz and 200 Hz. The narrowband speech signal, which requires a sampling frequency of 8 kb/s, provides a speech quality referred to as toll quality. Although this toll quality is sufficient for telephone communications, for emerging applications such as teleconferencing, multimedia services and high-definition television, an improved quality is necessary.
The communications quality can be improved for such applications by increasing the bandwidth. For example, by increasing the sampling frequency to 16 kHz, a wider bandwidth, ranging from 50 Hz to about 7000 Hz can be accommodated. This wider bandwidth is referred to in the art as the “wideband”. Extending the lower frequency range to 50 Hz increases naturalness, presence and comfort. At the other end of the spectrum, extending the higher frequency range to 7000 Hz increases intelligibility and makes it easier to differentiate between fricative sounds.
Background noise is usually a quasi-steady signal superimposed upon the voiced speech. For instance, assuming FIG. 1 represents the spectrum of an input speech signal and FIG. 2 represents a typical background noise spectrum. The goal of noise suppression systems is to reduce or suppress the background noise energy from the input speech.
To suppress the background noise, prior art systems divide the input speech spectrum into several segments (or channels). Each channel is then processed separately by estimating the signal-to-noise ratio (SNR) for that channel and applying appropriate gains to reduce the noise. For instance, if SNR is low, then the noise component in the segment is high and a gain much less than one is applied to reduce the magnitude of the noise. On the other hand, when SNR is high, then the noise component is insignificant and a gain closer to one is applied.
The problem with prior art noise suppression systems is that they are computationally cumbersome because they require complex fast Fourier transforms (FFT) and inverse FFT (IFFT). These FFT transformations are needed so that the signal can be manipulated in the frequency domain. In addition, some form of smoothing is required between frames to prevent discontinuities. Thus prior art approaches involve algorithms that is sometimes too complex for real-time applications.
The present invention provides a computationally simple noise suppression system applicable to real-time/real life applications.