In speech processing systems such as speech recognition, for example, it is desirable to remove noise from speech signals to thereby obtain accurate speech processing results. There are various techniques that have been developed to filter noise from an audio signal to obtain an enhanced signal for speech processing. Many of the known techniques use a single microphone solution (see, e.g., “Advanced Digital Signal Processing and Noise Reduction”, by S. V. Vaseghi, John Wiley & Sons, 2nd Edition, 2000).
For example, one approach for speech enhancement, which is based on psychoacoustic masking effects, is proposed in the article by S. Gustafsson, et al., A Novel Psychoacoustically Motivated Audio Enhancement Algorithm Preserving Background Noise Characteristics, ICASSP, pp. 397–400, 1998, which is incorporated herein by reference. Briefly, this method uses an observation from human hearing studies known as “tonal masking”, wherein a given tone becomes inaudible by a listener if another tone (the masking tone) having a similar or slightly different frequency is simultaneously presented to the listener. A detailed discussion of “tonal masking” can be found, for example, in the reference by W. Yost, Fundamentals of Hearing—An Introduction, 4th Ed., Academic Press, 2000.
More specifically, for a given speech signal (or more particular, for a given spectral power density), there is a psychoacoustic spectral threshold such that any interferer of spectral power below such threshold becomes unnoticed. In most de-noising schemes, there is a trade off between speech intelligibility (e.g., as measured by an “articulation index” defined in the reference by J. R. Deller, et al., Discrete-Time Processing of Speech Signals, IEEE Press, 2000) and the amount of removed noise as measured by SNR (signal-to-noise ratio) (see, the above-incorporated Gustafsson, et al. reference). Therefore, the entire removal of the noise from the speech signal is not necessarily desirable or even feasible.
Other noise reduction schemes that are known in the art employ two or more microphones to provide increased signal to noise ratio of the estimated speech signal. Theoretically, multi-channel techniques provide more information about the acoustic environment and therefore, should offer the possibility for improvement, especially in the case of reverberant environments due to multi-path effects and severe noise conditions known to affect the performance of known single channel techniques. However, the effectiveness of multiple channel techniques for a few microphones is yet to be proven.
For example, known beamforming techniques and, in general, conventional approaches that are based on microphone arrays, may achieve relatively small SNR improvements in the case of a small number of microphones. In addition, some multi-channel techniques may result in reduced intelligibility of the speech signal due to artifacts in the speech signal that are generated as a result of the particular processing algorithm.
Therefore, a speech enhancement system and method that would provide significant reduction of noise in a speech signal while maintaining the intelligibility of such speech signal for purposes of improved speech processing (e.g., speech recognition) would be highly desirable.