This invention relates to a method for reducing the noise in speech signals and a method for detecting the noise domain. More particularly, it relates to a method for reducing the noise in the speech signals in which noise suppression is achieved by adaptively controlling a maximum likelihood filter for calculating speech components based upon the speech presence probability and the SN ratio calculated on the basis of input speech signals, and a noise domain detection method which may be conveniently applied to the noise reducing method.
In a portable telephone or speech recognition system, it is thought to be necessary to suppress environmental noise or background noise contained in the collected speech signals and to enhance the speech components.
As techniques for enhancing the speech or reducing the noise, those employing a conditional probability function for adjusting attenuation factor are shown in R. J. McAulay and M. L. Malpass, Speech Enhancement Using a Soft-Decision Noise Suppression Filter, IEEE Trans. Acoust, Speech, Signal Processing, Vol.28, pp.137-145, April 1980, and J. Yang, Frequency Domain Noise Suppression Approach in Mobile Telephone System, IEEE ICASSP, vol.II, pp. 363-366, April 1993.
With these noise suppression techniques, it may occur frequently that unnatural speech tone or distorted speech be produced due to the operation based on an inappropriate fixed signal-to-noise (S/N) ratio or to an inappropriate suppression factor. In actual application, it is not desirable for the user to adjust the S/N ratio, which is among the parameters of the noise suppression system for achieving an optimum performance. In addition, it is difficult with the conventional speech signal enhancement techniques to remove the noise sufficiently without by-producing the distortion of the speech signals susceptible to considerable fluctuations in the short-term S/N ratio.
With the above-described speech enhancement or noise reducing method, the technique of detecting the noise domain is employed, in which the input level or power is compared to a pre-set threshold for discriminating the noise domain. However, if the time constant of the threshold value is increased for preventing tracking to the speech, it becomes impossible to follow noise level changes, especially to increase in the noise level, thus leading to mistaken discrimination.