1. Field of the Invention
This invention relates generally to electronic speech detection systems, and relates more particularly to a method for suppressing background noise in a speech detection system.
2. Description of the Background Art
Implementing an effective and efficient method for system users to interface with electronic devices is a significant consideration of system designers and manufacturers. Human speech detection is one promising technique that allows a system user to effectively communicate with selected electronic devices, such as digital computer systems. Speech generally consists of one or more spoken utterances which each may include a single word or a series of closely-spaced words forming a phrase or a sentence. In practice, speech detection systems typically determine the endpoints (the beginning and ending points) of a spoken utterance to accurately identify the specific sound data intended for analysis.
Conditions with significant ambient background-noise levels present additional difficulties when implementing a speech detection system. Examples of such noisy conditions may include speech recognition in automobiles or in certain manufacturing facilities. In such user applications, in order to accurately analyze a particular utterance, a speech recognition system may be required to selectively differentiate between a spoken utterance and the ambient background noise.
Referring now to FIG. 1(a), an exemplary waveform diagram for one embodiment of noisy speech 112 is shown. In addition, FIG. 1(b) depicts an exemplary waveform diagram for one embodiment of speech 114 without noise. Similarly, FIG. 1(c) shows an exemplary waveform diagram for one embodiment of noise 116 without speech 114. In practice, noisy speech 112 of FIG. 1(a) is therefore typically comprised of several components, including speech 114 of FIG. (1(b) and noise 116 of FIG. 1(c). In FIGS. 1(a), 1(b), and 1(c), waveforms 112, 114, and 116 are presented for purposes of illustration only. The present invention may readily function and incorporate various other embodiments of noisy speech 112, speech 114, and noise 116.
An important measurement in speech detection systems is the signal-to-noise ratio (SNR) which specifies the amount of noise present in relation to a given signal. For example, the SNR of noisy speech 112 in FIG. 1(a) may be expressed as the ratio of noisy speech 112 divided by noise 116 of FIG. 1(c). Many speech detection systems tend to function unreliably in conditions of high background noise when the SNR drops below an acceptable level. For example, if the SNR of a given speech detection system drops below a certain value (for example, 0 decibels), then the accuracy of the speech detection function may become significantly degraded.
Various methods have been proposed for speech enhancement and noise suppression. A spectral subtraction method, due to its simplicity, has been widely used for speech enhancement. Another known method for speech enhancement is Wiener filtering. Inverse filtering based on all-pole models has also been reported as a suitable method for noise suppression. However, the foregoing methods are not entirely satisfactory in certain relevant applications, and thus they may not perform adequately in particular implementations. From the foregoing discussion, it therefore becomes apparent that suppressing ambient background noise to improve the signal-to-noise ratio in a speech detection system is a significant consideration of system designers and manufacturers of speech detection systems.