Speech recognition systems are now being used in practical applications over telephone channels. Algorithms that are designed for small vocabulary, isolated word applications can produce acceptable recognition rates. A basic assumption for most speech recognition systems is that the utterance to be recognized contains only the words from the recognition vocabulary and background silence. However, previous studies have shown that it is extremely difficult to get real-world users of such systems to speak only the allowable input words. In a large scale trial of a speaker-independent, isolated word technology, it is noted that only 65% of all the callers followed the protocol, although the vocabulary consisted of only five words. Most conventional isolated word speech recognition algorithms are not designed to handle situations where users speak out-of-vocabulary (OOV) words.
In present art systems, there have been studies to solve the OOV problem. An ergodic Hidden Markov Model (HMM), commonly referred to as the "garbage" model, which is trained on speech collected from a large sample of speakers is used to model OOV words. It is generally designed such that each of the states represent a broad phoneme class acoustic, and the transition probabilities represent the frequency of making transitions between pairs of phoneme classes. However, for spoken speed dialing (SSD), which is a speaker-dependent application, there are mainly two problems with this approach. Firstly, since the garbage model is not trained on the particular speaker, its average acoustics may not match the vocal tract characteristics of that person. Secondly, the enrollment may be made under very different conditions (i.e., different levels of background noise, various handset types, etc.) Both of these problems contribute to a mismatch between the within-vocabulary (WV) model set and the garbage model.