1. Field of the Invention
This invention relates to speech recognition systems and particularly to a system capable of recognizing keywords in continuous speech.
2. Description of the Prior Art
Many speech recognition systems have been proposed for application in such fields as data processing, communications and machine control in industry.
U.S. Pat. Nos. 3,775,627 and 3,582,559 describe discrete word recognition systems which can only operate upon isolated utterances and will not function to detect keywords in continuous speech.
A word recognition system based on a sequence of phonetic event detections is disclosed in U.S. Pat. No. 3,588,363. This system, while applicable to discrete utterances, will not function with continuous speech, since the sound recognition network must be reset at the beginning of each word.
A limited vocabulary (fixed to two words) word recognition system is described in each of U.S. Pat. Nos. 3,557,310 and 3,688,126. Neither of these systems will respond to a keyword in continuous speech.
A system for detecting formants (poles in the vocal tract transfer function) in speech is disclosed in U.S. Pat. No. 3,499,989. This system performs speech analysis but not utterance classification.
A system for classifying vowel sounds and making vowel/nonvowel decisions is described in U.S. Pat. No. 3,428,748. This system, like the system of U.S. Pat. No. 3,499,989, is a speech analyzer and is not capable of performing utterance classification.
U.S. Pat. Nos. 3,129,287 and 3,742,143 describe limited vocabulary isolated word recognition systems which are unable to accomplish keyword recognition.
The system disclosed in U.S. Pat. No. 3,198,884 is oriented toward discrete digit recognition. This system establishes acoustic time registration via a segmentation procedure. These segmentation procedures are subject to gross errors and are unsuitable for keyword recognition.
The system taught in U.S. Pat. No. 3,742,146 is directed to the classification of vowel sounds and has no provision for combining these events for keyword detection.
In the article of G. L. Clapper, entitled "Automatic Word Recognition", found on pages 57-69 of IEEE Spectrum, August 1971, a system is described for recognizing discrete words. Since the system described in this article relies on word boundary information, keyword recognition by this system is impossible.
Asynchronous detection of keywords in continuous speech implies that no synchronization points are employed in the recognition process. Asynchronous detection is especially desirable in the classification of continuous speech for two reasons. First, the duration of a keyword in continuous speech is determined by the rate of speech and the stress given to the keyword as part of the spoken message. Second, the position of the same phonetic elements across an ensemble of different utterances of the same keyword may not be linearly related. This second reason reduces the applicability of linear time normalization which has been found useful in discrete word recognition.
Many of these prior art speech recognition systems, as well as other like systems, employ synchronization points in the recognition process. Conventional prior art procedures derive synchronization points with preclassification segmentation procedures. Segmentation procedures possess two inherent disadvantages. First, performance degrades rapidly when noise is applied to the signal. Second, computational requirements are often severe. Basing a keyword spotting or detection system on a synchronous process which is inherently noisy is not an optimal procedure, since an omitted segmentation boundary could inhibit keyword recognition even with perfect recognition logic.
None of the above-described systems is capable of operating on continuous speech to detect one or more keywords.