This invention is generally directed to speech recognition systems and is more specifically directed to the training of speech recognition systems to recognize different words.
Advances continue to be made in computer implemented speech recognition systems which utilize digital processing techniques to identify spoken words. Speech recognizers contain a plurality of key words, i.e. words which the recognizer has been trained to recognize. Typically, speaker independent recognizers are trained to recognize a key word by having a plurality of people speak the key word which is stored in digital form. After a plurality of different users have input (spoken) the same key word, the corresponding stored data is utilized by the training process to generate the model containing a set of parameters.
In operation, the recognizer accepts the entry of a word as spoken by a user and uses the digital representation of the spoken word as an input to the speech recognition process to compare the spoken word with the key word models. If the spoken word falls within the predefined validity parameters associated with a key word model, the recognizer determines that the input word is the corresponding key word. If the input word does not fall within any of the previously determined validity parameters, the recognizer determines that none of the key words was spoken by the user.
It is normally desirable to have a plurality of persons having different speech patterns and accents provide spoken inputs of the key word in order to obtain a model having corresponding broad validity parameters in order to accommodate variations of the spoken key word by different users. For this reason, speech recognizers typically have used hundreds or thousands of speech samples to generate a validation set of parameters for the corresponding key word. For a limited number of key words to be recognized, such a number of entries are not unduly burdensome. However, where it is desirable for a recognizer to be expanded to accept a substantial number of key words, the corresponding number of samples becomes large and, hence difficult and time consuming to obtain. It is also difficult to update a speech recognizer system to contain new key words, since a corresponding plurality of speech samples must be entered in order to generate the normal set of validation parameters for the new key words. Thus, the training of a speech recognizer by the entry of the spoken key words by a large number of persons represents a burden.