Developments in speech recognition technology have led to widespread and varied use of speech recognition systems in applications which rely on spoken input words or commands to perform some function. The use of speech recognition techniques in a repertory telephone voice dialer application is one example. It is known that the repertory dialing application allows users to train their own vocabularies for the purpose of associating a phone number to be dialed with each entry in the vocabulary. This can also be applied to other situations when a vocabulary word is trained and the system takes some action when the word is subsequently recognized. However, the list of words often grows to such an extent that it is difficult for an application user to remember when a word has already been entered. Alternatively, a large vocabulary also poses a problem to a user when a word is too similar to another one such that the speech recognizer is much less accurate on these words, if they appeared on the same list.
Traditionally, such systems have attempted to offer the capability to reject such utterances based on comparing the input speech for training the current word to all previously enrolled models. This requires a match that produces often one or more (in systems using N-best outputs) words and, if the resulting word is not the currently trained one or it is a word which has a very poor score, the utterance is added. This technique ignores the models themselves and uses only the correlation between the input speech and the collection of models to do the rejection.
Now, while the traditional systems attempt to handle detecting similar words, these systems cannot handle the case when two or more lists are being combined or more generally the case of manipulating vocabularies when the input audio is no longer available.