1. Field of the Invention
This invention relates generally to electronic speech recognition systems, and relates more particularly to a method for reducing noise distortions in a speech recognition system.
2. Description of the Background Art
Implementing an effective and efficient method for system users to interface with electronic devices is a significant consideration of system designers and manufacturers. Automatic speech recognition is one promising technique that allows a system user to effectively communicate with selected electronic devices, such as digital computer systems. Speech typically consists of one or more spoken utterances which each may include a single word or a series of closely-spaced words forming a phrase or a sentence.
Conditions with significant ambient background-noise levels present additional difficulties when implementing a speech recognition system. Examples of such noisy conditions may include speech recognition in automobiles or in certain manufacturing facilities. In such user applications, in order to accurately analyze a particular utterance, a speech recognition system may be required to selectively differentiate between a spoken utterance and the ambient background noise.
Referring now to FIG. 1(a), an exemplary waveform diagram for one embodiment of noisy speech 112 is shown. In addition, FIG. 1(b) depicts an exemplary waveform diagram for one embodiment of speech 114 without noise. Similarly, FIG. 1(c) shows an exemplary waveform diagram for one embodiment of noise 116 without speech 114. In practice, noisy speech 112 of FIG. 1(a) therefore is typically comprised of several components, including speech 114 of FIG. (1(b) and noise 116 of FIG. 1(c). In FIGS. 1(a), 1(b), and 1(c), waveforms 112, 114, and 116 are presented for purposes of illustration only. The present invention may readily incorporate various other embodiments of noisy speech 112, speech 114, and noise 116.
An automatic speech recognizer typically builds a comparison database for performing speech recognition when a potential user "trains" the recognizer by providing a set of sample speech. Speech recognizers tend to significantly degrade in performance when a mismatch exists between training conditions and actual operating conditions. Such a mismatch may result from various types of acoustic distortion.
The two main sources that typically create acoustic distortion are the presence of additive noise, (such as car noise, music or background speakers), and, convolutive distortions due to the use of various different microphones, use of a telephone channel, or reverberation effects. From the foregoing discussion, it therefore becomes apparent that reducing noise distortions in a speech recognition system is a significant consideration of system designers and manufacturers of speech recognition systems.