The present invention relates to a large vocabulary speech recognition system and more particularly to such a system incorporates a method for expeditiously generating a recognition model for a non-standard word uttered by a user.
Large vocabulary speech recognition systems typically use statistical reference models to represent each word in the standard vocabulary, e.g. Hidden Markov Models (HMM). During the setting up and training of such a speech recognition system and the establishment of its standard vocabulary, a number of probability distributions are generated or obtained from heavily redundant training data. The redundancy in the training data allows for reasonable probability distributions to be obtained.
The user of a large vocabulary speech recognition system, however, is very likely to encounter vocabulary omission errors, i.e. instances when the system cannot recognize a desired input utterance because it has not been trained to do so beforehand. Clearly, the user will wish to correct this omission with a minimum of effort, preferably by merely informing the system about the identity of the utterance spoken. However, existing techniques for the creation of reference models are inappropriate for the expeditious addition of non-standard words since they are both computationally extensive and do not work well with a single example or utterance to define the new word. The difficulty is further compounded by the fact that the system will typically have no information about the new word other than the acoustic pattern of the utterance itself. For example, it will typically not be reasonable to provide a phonetic transcription of the new word which would facilitate generation of a new model from existing phoneme data. This problem is particularly common for items such as proper names.
Among the several objects of the present invention are the provision of a method for adding non-standard words to a large vocabulary speech recognition system; the provision of such a method which can permit such an addition based upon a single utterance or example; the provision of such a method which will produce a high quality model which can be reliably used to recognize other instances of the same word; the provision of such a method which is not computationally demanding; the provision of such a method which can be easily utilized by a user without undue interruption of the user's work; the provision of such a method which is highly reliable and which is of relatively simple and inexpensive implementation.