The present invention is directed to a speech recognition system. More particularly, the present invention is directed to a speech recognition system that uses multiple speech recognizers to increase its accuracy.
Speech recognition systems are increasingly being used to translate human spoken words or utterances into their written equivalent and meaning. Speech recognition systems can avoid the need for spoken utterances to be manually entered into a computer, or to be recognized by a human. Therefore, speech recognition systems are desirable for many businesses because these systems can minimize the number of human operators needed to handle calls from customers.
One drawback to speech recognition systems however, is that they can provide inaccurate results. An exact correspondence between the spoken utterance and an output recognized by a speech recognizer is difficult to attain due to, for example, the deterioration of speech signals that routinely occurs over conventional telephone lines and algorithmic limitations. Such deterioration present in the speech signals may cause a speech recognizer to produce a recognized output that does not correspond to the spoken utterance. Because of limitations introduced into the speech signal by the telephone lines, the speech recognizer may confuse similar sounding letters and numbers. Thus, a speech recognizer may confuse the letter xe2x80x9cBxe2x80x9d with the number xe2x80x9c3xe2x80x9d or the letter xe2x80x9cCxe2x80x9d. For example, given that a user utters the numbers xe2x80x9c123xe2x80x9d into a telephone, the speech recognizer may produce xe2x80x9c12Bxe2x80x9d as the output.
Additionally, various speech recognizers have their own strengths and weaknesses with respect to accurately identifying spoken utterances. For example, one speech recognizer may perform better at recognizing a sequence of alpha-numeric characters while other speech recognizers perform better at recognizing proper nouns such as for examples, names of places, people and things. Also, some speech recognizers can execute certain tasks faster or require less processing time than other speech recognizers.
If such speech recognition systems are utilized, it is important that the speaker communicate accurate information to the system with maximum machine assistance and minimum user intervention. For example, it is desirable that the user be prompted as few times as possible to repeat questionable information or to supply additional information for the speech recognition system to reach the correct result.
Based on the foregoing there is a need for a speech recognition system that has an increased recognition accuracy without the necessity of relying on human operator intervention or requiring additional input from the user.
One embodiment of the present invention is a speech recognition system for recognizing spoken utterances received as a speech signal from a user. A prompt for requesting a spoken utterance from the user is assigned a response identifier which indicates at least one of a plurality of speech recognizers to best recognize a particular type of spoken utterance. The system includes a processor for receiving the speech signal from the user in response to the prompt. The processor also directs the speech signal to the at least one speech recognizer indicated by the response identifier. The speech recognizer generates a plurality of spoken utterance choices from the speech signal and a probability associated with each of the plurality of choices. At least one of the spoken utterance choices is selected based on the associated probabilities.