The present invention relates generally to speech recognition and, more particularly, to a technique for estimating confusability among sound units which can be used for constructing decision trees associated with allophone models, for confusability rejection modeling and for sound unit clustering.
Automated speech recognition has made significant strides, and yet this technology still falls far short of what a human can do with relative ease. One particularly difficult problem involves confusability among sound units. A typical speech recognition system models speech in terms of sound units, such as phonemes, syllables, words, or the like. Depending on the system configuration, there can be many sound units, each having many possible subtle variations, depending on where the sound unit occurs within the context of a spoken utterance. Certain sound units are frequently confused with other sound units, causing recognition errors. Whereas the human listener can usually discriminate among confusable sound units, the speech recognition system may have great difficulty.
In an effort to improve discrimination among confusable sound units, it would be desirable to develop the recognition system so that more processing power is devoted to situations where confusion is likely. The present invention addresses this need. As will be more fully explained, the technique employs a procedure whereby for each example of a given sound unit a set of models representing other sound units (i.e., incorrect sound units) are used to calculate a likelihood score. The incorrect models generating high likelihood scores for this example represent those most likely to lead to recognition error.
The resulting confusability data generated through the above analysis may be used in a variety of ways. The data may be used to develop decision trees for allophone modeling. The data may also be used to develop confusability predictors used for rejection during search. The data may also be used in developing continuous speech recognition models that are optimized to minimize confusability.
For a more complete understanding of the invention, its objects and advantages, refer to the following specification and to the accompanying drawings.