1. Field of the Invention
The present invention relates to speech recognition and more specifically to using alternate recognition hypotheses for speech recognition.
2. Introduction
Despite decades of research and development, speech recognition technology is far from perfect and speech recognition errors are common. While speech recognition technology is not perfect, it has matured to a point where many organizations implement speech recognition technology in automatic call centers to handle large volumes of telephone calls at a relatively low cost. Such spoken dialog systems rely on speech recognition for user input. Recognition errors lead to misunderstandings that can lengthen conversations, reduce task completion, and decrease customer satisfaction.
As part of the process of identifying errors, a speech recognition system generates a confidence score. The confidence score is an indication of the reliability of the recognized text. When the confidence score is high, then recognition results are more reliable. However, the confidence score itself is not perfect and can contain errors. Even in view of these weaknesses, commercial dialog systems often use a confidence score in conjunction with confirmation questions to identify recognition errors and prevent failed dialogs. A speech recognition system can ask explicit or implicit confirmation questions based on the confidence score. If the confidence score is low, the system can ask an explicit confirmation question such as ‘Did you say Nebraska?’ If the confidence score is high, the system can ask an implicit confirmation question such as ‘Ok, Nebraska. What date do you want to leave?’ Explicit confirmations are more reliable but slow down the conversation. Conversely, implicit confirmations are faster but can lead to more confused user speech if incorrect. The confused speech can lead to additional difficulty in recognition and can lead to follow-on errors.
Some researchers attempt to spot bad recognitions using pattern classification. In these cases, a system provides a pattern classifier with the recognized text, the duration of the speech, the current dialog context, and other output from the speech recognizer. These approaches have been shown to identify errors better than using the confidence score alone. Accordingly, what is needed in the art is an improved way to perform speech recognition.