In an interactive voice response (IVR) system using automatic speech recognition (ASR) it is common for callers to speak to the system in the same way they would speak to another human being. This becomes a problem for ASR when the user is asked to repeat information they have already given. It has been observed that certain traits are exhibited when one is repeating oneself; for instance, speaking slower or faster; increasing or decreasing pitch; increasing volume; hyper-stressing key phonemes; and emphasising syllabic breaks. These traits are problematic for ASR as the traits move the speech further away from the normalized speech models and algorithms on which the recognition engine is based.
U.S. Pat. No. 6,751,591 discloses a method and system for an ASR dialog system. If a user's input communication cannot be understood, a probability of understanding the user's input communication is derived from the ASR data. If the probability exceeds a first threshold then the dialog strategy is adapted according to the ASR data and the dialog with the user is extended. If the results of the extended dialog cannot be understood, then the adapted dialog strategy is further adapted, based on the ASR data from both the original dialog and the extended dialog.
Another known solution that attempts to improve recognition re-prompts a user to speak in a normal voice. However this solution does not take into account that the user may have already spoken in their “normal” voice and that the ASR engine may simply have trouble recognizing them. Also, asking a user to use their “normal” voice makes that user self-conscious of how they speak and cause them to distort what is really their normal voice as they try to comply.
It would be advantageous to find a way of using repeated utterances to improve the accuracy of speech recognition.