1. Technical Field
The present invention generally relates to speech recognition systems and, in particular, to a method for augmenting alternate word lists from which a correct word is selected in place of a word wrongly decoded by a speech recognition system. The method employs acoustic confusability criterion to augment such alternate word lists.
2. Background Description
Conventional speech recognition systems generally include facilities that allow a user to correct decoding errors. In particular, when a user determines that a word has been wrongly decoded, the user may query the system for a list of alternative words corresponding to that word. In general, such a list contains high-probability alternatives to the word decoded at each position of an audio stream. These alternatives are computed live from the audio stream in question, and reflect the normal operation of the speech recognition engine, which must typically choose, from among several possible decodings of each segment of the audio stream, the preferred word to transcribe.
By xe2x80x9cnormal operation of the speech recognition enginexe2x80x9d, we mean the following. Let h=w1,w2, . . . , wixe2x88x921 represent some sequence of decoded words, corresponding to some portion of the audio stream a(wi,w2, . . . , wixe2x88x921). Typically, the exact end time of word wixe2x88x921 is not known, and the system proceeds by considering a range of possible end times of this word, and there for start times of the next word.
The system must now guess the identity of the next word wi based upon consideration of the acoustic signal a(wi, wl+1, . . . ) and likewise consideration of the words decoded up to that point, which is the sequence h defined above. There is a principled way of making this guess, which is to consider the product p(a(wi)|x)xc2x7p(x|h), as xruns over various words in the recognizer vocabulary. In this expression, the first factor, p(a(wi)|x), is known as the acoustic model probability, and the second factor, p(x|h), is known as the language model probability. In general, these raw values may be geometrically or otherwise weighted before being combined. However, to simplify this discussion, the acoustic model probability and the language model probability will be combined by simply computing their product, as indicated above.
Although in principle this product could be evaluated for every word x of the recognizer""s vocabulary, this is seldom done in practice. Instead, some short list of candidates is first computed. For instance, only the top N words of the vocabulary may be retained for further consideration, when ranked according to the language model score p(x|h). Let us refer to this as the list of language model candidates C. Typically, acoustic model scores p(a(wi)|x) are then computed only for xxcex5C. Thereafter, a further winnowing of the elements of C will occur, retaining, for example, only the top M words of C when ranked according to the product p(a(wi)|x)xc2x7p(x|h). Alternatively, the system may retain only those words xxe2x80x2 such that the product p(a(wi)|xxe2x80x2)xc2x7p(xxe2x80x2|h) lies within some fixed fraction of the maximal value p(a(wi)|{circumflex over (x)})xc2x7p({circumflex over (x)}|h).
The resulting set of candidates or hypotheses then comprises the list of alternate words for the given segment of the acoustic signal. Note that it is entirely possible that this set may contain only one single element, {circumflex over (x)}. It is also possible that this word may be wrong, and the correct word may not be included within the alternate word list.
The system retains in memory this list of possibilities, associated with the given segment. The system typically computes and retains as well the product p(a(wi)|x)xc2x7p(x|h) cited above, or some other figure of merit for each word in the list. When the user determines that an error has been made in a particular position of the audio stream, the system presents this list of possible words to the user; the user may then select the correct word from among the list of possible words if the correct word is present, or type in a completely different word if the correct word is not present. It is of course much more convenient if the correct word appears in the list. Unfortunately this is not always the case; indeed frequently NO alternatives are presented. The invention is a method for augmenting such alternate word lists, increasing the odds that the correct word will be presented to the user.
Accordingly, it would be desirable and highly advantageous to have a method for augmenting such alternate word lists, to increase the probability that the correct word is presented to the user. Such a method should also increase the convenience of using a speech recognition system employing the same.
The problems stated above, as well as other related problems of the prior art, are solved by the present invention, a method for augmenting alternate word lists generated by a speech recognition system. The alternate word lists are used to provide words from which a user may select a correct word in a place of a wrongly decoded word by the system. The method employs acoustic confusability criterion to augment such alternate word lists.
The use of augmented alternate word lists according to the invention significantly increases the number of times that the alternate word lists contain the correct word. Thus, the convenience of using a speech recognition system is increased.
According to a first aspect of the invention, there is provided a method for augmenting an alternate word list generated by a speech recognition system. The alternate word list includes at least one potentially correct word for replacing a wrongly decoded word. The method includes the step of identifying at least one acoustically confusable word with respect to the wrongly decoded word. The alternate word list is augmented with the at least one acoustically confusable word.
According to a second aspect of the invention, the augmenting step includes the step of adding the at least one acoustically confusable word to the alternate word list.
According to a third aspect of the invention, the system includes a vocabulary having a plurality of words included therein, and the identifying step includes the steps of: respectively determining a similarity between pronunciations of each of at least one of the plurality of words included in the vocabulary with respect to the wrongly decoded word; and respectively expressing the similarity by a score.
According to a fourth aspect of the invention, the identifying step identifies the at least one acoustically confusable word based on the score.
According to a fifth aspect of the invention, the at least one acoustically confusable word includes a plurality of acoustically confusable words, and the augmenting step includes the steps of: ranking each of the plurality of acoustically confusable words based on the score; and adding at least one of the plurality of acoustically confusable words to the alternate word list, in descending order with respect to the score.
According to a sixth aspect of the invention, the augmenting step further includes the step of restricting a number of words added to the alternate word list based on a predefined threshold.
According to a seventh aspect of the invention, the predefined threshold corresponds to a maximum number of words to be added to the alternate word list.
According to a eighth aspect of the invention, the predefined threshold corresponds to a maximum size of the alternate word list.
According to a ninth aspect of the invention, the predefined threshold corresponds to a minimum score for words to be added to the alternate word list.
According to a tenth aspect of the invention, the at least one potentially correct word includes a plurality of potentially correct words and the at least one acoustically confusable word includes a plurality of acoustically confusable words, and the method further includes the step of inserting at least some of the plurality of acoustically confusable words in the alternate word list so as to be disposed in alternating positions with respect to at least some of the plurality of potentially correct words.
According to an eleventh aspect of the invention, the some of the plurality of acoustically confusable words are inserted in the alternate word list in descending order with respect to the score.
According to a twelfth aspect of the invention, the adding step adds only words not already present in the alternate word list.
According to a thirteenth aspect of the invention, the at least one potentially correct word includes a plurality of potentially correct words and the at least one acoustically confusable word includes a plurality of acoustically confusable words, and the augmenting step includes the steps of: determining a first regression function that estimates, for each of the plurality of acoustically confusable words, a probability that the each of the plurality of acoustically confusable words is correct based on the score of the each of the plurality of acoustically confusable words; and determining a second regression function that estimates, for each of the plurality of potentially correct words, a probability that the each of the plurality of potentially correct words is correct based on the score of the each of the plurality of potentially correct words; combining the plurality of acoustically confusable words and the plurality of potentially correct words; and sorting the plurality of acoustically confusable words and the plurality of potentially correct words based on the probability respectively estimated by the first and the second regression functions.
According to a fourteenth aspect of the invention, the speech recognition system includes a vocabulary having a plurality of words included therein, and the method further includes the step of determining the at least one acoustically confusable word with respect to each of at least one of the plurality of words included in the vocabulary.
According to a fifteenth aspect of the invention, the method further includes the step of pre-storing in a database a plurality of entries for at least some of the plurality of words comprised in the vocabulary, each of the plurality of entries including at least one word that is acoustically confusable with respect to each of the at least some of the plurality of words included in the vocabulary.
According to a sixteenth aspect of the invention, the identifying step includes the step of accessing the database to determine whether there exists an entry for the wrongly decoded word.
According to a seventeenth aspect of the invention, each of the plurality of entries further includes, for each of the at least one word, a score that represents a probability of acoustic confusion between the wrongly decoded word and the at least one word.
According to an eighteenth aspect of the invention, the method is implemented by a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform the method steps.
According to a nineteenth aspect of the invention, there is provided a method for augmenting an alternate word list generated by a speech recognition system. The alternate word list includes at least one potentially correct word for replacing a wrongly decoded word. The method includes the step of identifying a set of acoustically confusable words with respect to the wrongly decoded word. The alternate word list is augmented with at least one acoustically confusable word from the set, based on a similarity of pronunciations between the wrongly decoded word and the at least one acoustically confusable word from the set.
According to a twentieth aspect of the invention, in a speech recognition system having a vocabulary, there is provided a method for augmenting an alternate word list generated by the system. The alternate word list includes at least one potentially correct word for replacing a wrongly decoded word. The method includes the step of identifying a set of acoustically confusable words with respect to the wrongly decoded word, based on a similarity of pronunciations therebetween. The alternate word list is augmented with at least one acoustically confusable word from the set.
These and other aspects, features and advantages of the present invention will become apparent from the following detailed description of preferred embodiments, which is to be read in connection with the accompanying drawings.