The present invention relates generally to noise suppression apparatus and methods and in particular to a noise suppression system and method which detects and removes background noise from a voice signal using spectral subtraction.
Voice control of devices and appliances is becoming more and more prevalent. One example of this technology is in "hands free" control of mobile telephones. This application is especially important as it allows a driver to make calls or answer a telephone while keeping both hands on the steering wheel and both eyes on the traffic. Although the invention is described below in terms of a noise reduction method and system for a hands-free mobile telephone, it is contemplated that it may be practiced with any system which would benefit from noise reduction in a voice signal.
Voice control of mobile telephones, however, is complicated by the ambient noise in the automobile. Engine noise, windshield wiper noise, construction noise and the noise of passing cars and trucks can interfere with the voice recognition process making the mobile telephone difficult to control using just vocal signals. Existing voice control systems typically employ some form of speech enhancement to reduce the level of noise in the signal and then apply the noise-reduced signal to a voice recognition system. As the number of words that are recognized by a typical voice control system is relatively low, a speaker-independent voice recognition system may be used. Such a speaker-independent system is disclosed, for example, in U.S. Pat. No. 5,799,276 entitled KNOWLEDGE-BASED SPEECH RECOGNITION SYSTEM AND METHODS HAVING FRAME LENGTH COMPUTED BASED ON ESTIMATED PITCH PERIOD OF VOCALIC INTERVALS. Alternatively, it is contemplated that other speech recognition systems such as a conventional dynamic time warping system may be used.
In addition to reducing noise in user commands that control the mobile telephone, the speech enhancement system may also be used to reduce noise in the voice signal that is delivered through the telephone and, thus, enhance the speech signal that is received by the person being called.
Low complexity spectrum-based speech enhancement systems are generally based on the spectral subtraction principle: the noise power spectrum which has been estimated (and averaged) during noise-only periods is subtracted from the "speech-plus-noise" spectrum in order to estimate the power spectrum of the clean speech signal. The enhanced speech waveform makes use of the unaltered noisy phase. Formally, the enhanced speech spectrum can be expressed as S.sub.k (f)=G.sub.k (f)X, where X.sub.k (f) is the (discrete) Fourier transform (DFT) of the noisy speech signal x(n) at frame index k, S.sub.k (f) is estimated clean speech power spectrum, and G the gain factor. In the case of (power) spectral subtraction, the gain factor, G, is a vector given by ##EQU1##
where P.sub.n and P.sub.x are the estimated noise power spectrum and speech-plus-noise power spectrum respectively.
Before a speech enhancement system reduces the noise in a noisy voice signal, therefore, it first identifies the noise and estimates its power spectrum. Next, the noisy voice signal is processed according to the determined gain factor to selectively reduce the amplitude of the speech-plus-noise signal for the frequency bands in which noise dominates.
Even though these systems are "low complexity" they typically require a relatively large number of calculations and may not be appropriate for implementation in a mobile telephone environment. One method for reducing the complexity of the speech enhancement process is to assume that the noise component of the signal is stationary, that is to say, it does not change significantly over short time intervals. This assumption, however, is not appropriate for a hands-free mobile telephone controller as engine noise and traffic noise are poorly modeled by a stationary noise source.