This invention relates to a speech feature extraction system for use in speech recognition, voice identification or voice authentication systems. More specifically, this invention relates to a speech feature extraction system that can be used to create a speech recognition system or other speech processing system with a reduced error rate.
Generally, a speech recognition system is an apparatus that attempts to identify spoken words by analyzing the speaker's voice signal. Speech is converted into an electronic form from which features are extracted. The system then attempts to match a sequence of features to previously stored sequence of models associated with known speech units. When a sequence of features corresponds to a sequence of models in accordance with specified rules, the corresponding words are deemed to be recognized by the speech recognition system.
However, background sounds such as radios, car noise, or other nearby speakers can make it difficult to extract useful features from the speech. In addition, a change in the ambient conditions such as the use of a different microphone, telephone handset or telephone line can interfere with system performance. Also, a speaker's distance from the microphone, differences between speakers, changes in speaker intonation or emphasis, and even a speaker's health can adversely impact system performance. For a further description of some of these problems, see Richard A. Quinnell, “Speech Recognition: No Longer a Dream, But Still a Challenge,” EDN Magazine, Jan. 19, 1995, p. 41–46.
In most speech recognition systems, the speech features are extracted by cepstral analysis, which generally involves measuring the energy in specific frequency bands. The product of that analysis reflects the amplitude of the signal in those bands. Analysis of these amplitude changes over successive time periods can be modeled as an amplitude modulated signal.
Whereas the human ear is a sensitive to frequency modulation as well as amplitude modulation in received speech signals, this frequency modulated content is only partially reflected in systems that perform cepstral analysis.
Accordingly, it would be desirable to provide a speech feature extraction system capable of capturing the frequency modulation characteristics of speech, as well as previously known amplitude modulation characteristics.
It also would be desirable to provide speech recognition and other speech processing systems that incorporate feature extraction systems that provide information on frequency modulation characteristics of the input speech signal.