1. Field of the Invention
The present invention relates to noise reduction in speech signals.
2. The Prior Art
Noise, when added to a speech signal, can impair the quality of the signal, reduce intelligibility, and increase listener fatigue. It is therefore of great importance to reduce noise in a speech signal in relation to hearing aids, but also in relation to telecommunication.
Various methods of noise reduction in a speech signal are known. These methods include spectral subtraction and other filtering methods, e.g., Wiener filtering. Spectral subtraction is a technique for reducing noise in speech signals, which operates by converting a time domain representation of the speech signal into the frequency domain, e.g., by taking the Fourier transform of segments of the speech signal. Hereby a set of signals representing the short term power spectrum of the speech is obtained. During the speech-free periods, an estimate of the noise power spectrum is generated. The obtained noise power spectrum is subtracted from the speech power spectrum signals in order to obtain a noise reduction. A time domain speech signal is reconstructed using the resulting spectrum, e.g., by use of the inverse Fourier transform. Hereby the time-domain signal is reconstructed from the noise-reduced power spectrum and the unmodified phase spectrum.
Even though this method has been found to be useful, it has the drawback that the noise reduction is based on an estimate of the noise spectrum and is therefore dependent on stationarity in the noise signal to perform optimally.
As the noise in a speech signal is often non-stationary, the estimated noise spectrum used for spectral subtraction will be different from the actual noise spectrum during speech activity. This error in noise estimation tends to affect small spectral regions of the output, and will result in short duration random tones in the noise reduced signal. Even though these random noise tones are often a low-energy signal compared to the total energy in the speech signal, the random tone noise tends to be very irritating to listen due to psycho-acoustic effects.
The object of the invention is to provide a method which enables noise reduction in a speech signal, and which avoids the above-mentioned drawbacks of the prior art.
The invention is based on the circumstance that a model-based representation describing the quasi-stationary part of the speech signal can be generated on the basis of a third spectrum, which is generated by spectral subtraction of a first spectrum generated on the basis of a speech signal and a second spectrum generated as an estimate of the noise power spectrum. The spectral subtraction enables the use of model-based representation for speech signals including noise, and the model-based representation of the quasi-stationary part of the speech signal enables an improved noise reduction compared to methods of prior art, as it enables use of a prior knowledge of speech signals.
This unconventional use of a combination of both traditional and model-based methods of noise reduction in a speech signal is advantageous, as it permits smooth manipulation of the speech signal in order to obtain improved noise reduction without artefacts.
As the model based representation is generated dynamically, i.e., on the fly, movements of the formants in the third spectrum will not affect the quality of the noise reduction, and the method according to the invention is therefore advantageous compared to methods of the prior art.
Preferably, the model-based representation can include parameters describing one or more formants in the third spectrum. This is advantageous as the formants, i.e., peaks in the signal spectrum, which are related to the speech, in a the third spectrum contains essential features of the speech signal, and as it is possible to manipulate the formants by using the parameters, and hereby to manipulate the resulting speech signal.
The parameters preferably reflect the resonance frequency, the bandwidth, and the gain at the resonance frequency of the formants in the third spectrum.
In a preferred embodiment, the manipulation can include spectral gaining, which is based on a structure parameters reflecting structure in the spectrum. Spectral gaining attenuates relatively broad fox wants since these cause unwanted artefacts. This method is based on the fact that man-made speech produces narrow formats in the absence of noise.
The structure parameter S can be preferably given by S=B*G, where B is the bandwidth ratio of the formants in the third spectrum, and G is the gain ratio of the formants in the third spectrum.
Noise reduction is preferably performed in said second signal. This is advantageous as noise will also be present in the second signal, and a noise reduction in this signal will therefore result in a noise reduction in the resulting signal.
The second signal can correspond to the speech signal. This is advantageous in some cases, e.g., when the signal/noise ratio approximately equals 0 dB.
The second signal can represent the residual signal, i.e., the non-stationary part of the speech signal such as information reflecting the articulation. This is advantageous in some cases, e.g., when the signal/noise ratio approximately equals 6 dB.
Various signal elements of the second signal, such as pitch pulses, stop consonants and noise transients, can be preferably amplified or attenuated. This is advantageous in some cases, e.g., when the signal/noise ratio approximately equals xe2x88x926 dB.
The present invention also relates to an apparatus for noise reduction in speech signals.
The invention will be explained more fully by the following description with reference to the accompanying drawings.