The present invention relates generally to speech processing and recognition, and specifically, to methods and systems of predicting the likelihood of a human or machine confusing two different words or phrases.
Words that sound alike tend to be confused with one another. For example, a listener hearing the word "brush" may mistakingly believe she is hearing "rush," while the same listener hearing the word "falcon" is not likely to believe that she is hearing the word "peckham."
Knowing that two words are likely to be confused can be useful in a number of applications. For example, two applications that may benefit from knowing the likelihood of confusion of two words are: speech recognition algorithms, which attempt to associate a spoken word with its intended orthography (i.e., the English written version of the word), and vocabulary rejection algorithms, which determine whether a spoken word is present in a predefined dictionary.
One method to determine the likelihood of confusion between spoken words is to simply have a human expert rate the potential confusability of the words based on intuition and experience. However, this method is laborious and results in subjective and inconsistent confusability measurements.
Thus, there is a need to be able to automatically generate, from a written version of a word, an objective metric of the likelihood of confusing a spoken word with another spoken word.