The present invention relates to a speech recognition method and more particularly to a method for recognizing in real time, one or more keywords in a continuous audio signal.
Various speech recognition systems have been proposed herebefore to recognize isolated utterances by comparing an unknown isolated audio signal, suitably processed, with one or more previously prepared representations of the known keywords. In this context, "keywords" is used to mean a connected group of phonemes and sounds and may be, for example, a portion of a syllable, a word, a phrase, etc. While many systems have met with limited success, one system, in particular, has been employed successfully, in commercial applications, to recognize isolated keywords. That system operates substantially in accordance with the method described in U.S. Pat. No. 4,038,503, granted July 26, 1977, assigned to the assignee of this application, and provides a successful method for recognizing one of a restricted vocabulary of keywords provided that the boundaries of the unknown audio signal data are either silence or background noise as measured by the recognition system. That system relies upon the presumption that the interval, during which the unknown audio signal occurs, is well defined and contains a single utterance.
In a continuous audio signal, such as continuous conversational speech, wherein the keyword boundaries are not a prior known or marked, several methods have been devised to segment the incoming audio data, that is, to determine the boundaries of linguistic units, such as phonemes, syllables, words, sentences, etc., prior to initiation of a keyword recognition process. These prior continuous speech systems, however, have achieved only a limited success in part because a satisfactory segmenting process has not been found. Other substantial problems still exist; for example, only limited vocabularies can be consistently recognized with a low false alarm rate, the recognition accuracy is highly sensitive to the differences between voice characteristics of different talkers, and the systems are highly sensitive to distortion in the audio signals being analyzed, such as typically occurs, for example, in audio signals transmitted over ordinary telephone communications apparatus. Thus, even though continuous speech is easily discernible and understood by the human observer, machine recognition of even a limited vocabulary of keywords in a continuous audio signal has yet to achieve major success.
A principal object of the present invention is therefore a speech recognition method having improved effectiveness in recognizing keywords in a continuous, unmarked audio signal. Other objects of the invention are a method which is relatively insensitive to phase and amplitude distortion of the unknown audio input signal data, a method which is relatively insensitive to variations in the articulation rate of the unknown audio input signals, a method which will respond equally well to different speakers and hence different voice characteristics, a method which is reliable, and a method which will operate in real time.