The invention herein described was made in the course of or under a contract with the Department of the Air Force.
The present invention relates to a speech recognition method and apparatus and more particularly to a method of and apparatus for recognizing in real time, one or more keywords in a continuous audio signal.
Various speech recognition systems have been proposed herebefore to recognize isolated utterances by comparing an unknown isolated audio signal, suitably processed, with one or more previously prepared representations of known keywords. In this context, "keywords" is used to mean a connected group of phonemes and sounds and may be, for example, a portion of a syllable, a word, a phrase, etc. While many systems have met with limited success, one system, in particular, has been employed successfully, in commercial applications, to recognize isolated keywords. This system operates substantially in accordance with the method described in U.S. Pat. No. 4,038, 503, granted July 26, 1977, assigned to the assignee of this application, and provides a successful method for recognizing one of a restricted vocabulary of keywords provided that the boundaries of the unknown audio signal data are either silence or background noise as measured by the recognition system. That system relies upon the presumption that the interval, during which the unknown audio signal occurs, is well defined and contains a single keyword utterance.
In a continuous audio signal, such as continuous conversational speech, wherein the keyword boundaries are not a priori known or marked, several methods have been devised to segment the incoming audio data, that is, to determine the boundaries of linguistic units, such as phonemes, syllables, words, sentences, etc., prior to initiation of a keyword recognition process. These prior continuous speech systems, however, have achieved only a limited success in part because a satisfactory segmenting process has not been found. Other substantial problems still exist: for example, only limited vocabularies can be consistently recognized with a low false alarm rate; the recognition accuracy is highly sensitive to the differences between voice characteristics of different talkers; and the systems are highly sensitive to distortion in the audio signals being analyzed, such as typically occurs, for example, in audio signals transmitted over ordinary telephone communications apparatus.
The continuous speech recognition methods described in U.S. applications Ser. Nos. 901,001; 901,005; and 901,006, all filed April 27, 1978, and now U.S. Pat. Nos. 4,227,176; 4,241,329; and 4,227,177, respectively, describe commercially acceptable and effective procedures for successfully recognizing, in real time, keywords in continuous speech systems. The general methods described in these patents are presently in commercial use and have been proved both experimentally and in practical field testing to effectively provide a high reliability and low error rate, in a speaker-independent environment. Nevertheless, even these systems, while at the forefront of present day technology, and the concept upon which they were developed, have shortcomings in both the false-alarm rate and speaker-independent performance.
Therefore, a principal object of the present invention is a speech recognition method and apparatus having improved effectiveness in recognizing keywords in a continuous, unmarked audio signal. Other objects of the invention are a method and apparatus which are relatively insensitive to phase and amplitude distortion of the unknown audio input signal data, which are relatively insensitive to variations in the articulation rate of the unknown audio input signals, which will respond equally well to different speakers and hence different voice characteristics, which are reliable and have an improved lower false-alarm rate, and which operate in real time.