The invention relates generally to speech recognition apparatus and methods, and in particular, to a speech recognition apparatus and method for modifying the content of the set of stored representations, the reference patterns, of words of phrases to be recognized.
In the field of speech recognition, many different methods have been described for improving the utterance representing reference patterns against which speech recognition is to be made. These reference patterns are typically created during a training session, prior to actual recognition of unknown incoming speech, and the resulting patterns are stored in a reference pattern memory and represent ether an entire word or phrase or portions of a word to be recognized.
According to most speech recognition methods, the reference patterns, once calculated and stored, remain immutable unless a new "off-line" training session is undertaken to update the reference patterns, for example in response to a new recognition environment, new equipment, or in the extreme a new speaker. In general, these speech recognition systems do not provide a method for updating the stored reference patterns during the recognition of unknown speech with regard to the words or phrases to be recognized. (Recognition system have provided an update on a silence representing reference pattern by recognizing the period between actual speech utterances, and providing an updated version of that silence reference pattern. These systems, however, have not provided updated reference patterns for the stored reference patterns representing actual speech.) Thus, in particular, inexperienced users often speak in a different manner during the training phase than they do later when they are using the speech recognizer in an application to accomplish some task. It is well known that the best recognition results come from training which manages to induce the users to speak the way that they will speak in using the product. This is difficult to do and most recognition systems do not achieve these "best results."
Accordingly, a primary object of the invention is to improve the recognition accuracy in a speech recognition environment. Other objects of the invention are a dynamic reference pattern updating mechanism for improving the precision with which incoming unknown speech can be identified, and providing reference patterns which better characterize a speaker's manner of pronouncing a selected word vocabulary.
A further object of the invention is to provide more "training" data during actual recognition sessions and selectively accumulate more data on the items which most need improvement, that is, those items on which the system is making errors.