In spoken dialog systems, a computer system equipped with an automatic speech recognizer attempts to understand and interpret a spoken utterance input by a user. A dialog manager component determines an appropriate conversation strategy based on the user's input and controls the flow of the conversation with the user.
In such systems, a confidence measure refers to an indication of the system's level of uncertainty in its interpretations of a user's utterance. The confidence measure is an important component of a spoken dialog system in that the dialog manager relies on it to determine the appropriate conversation strategy.
Confidence measures have been used in some other types of systems, such as automatic speech recognizers and semantic analyzers. In such systems, either knowledge-based or data-driven features have been used in deriving a confidence measure. Similarly, features from speech recognizers and classification components have been used to derive confidence measures for call routing dialog systems. None of these prior systems have addressed the generation of a confidence measure in a voice search system.
Voice search technology underlies many spoken dialog applications that provide users with information that they request with a spoken query. For example, directory assistance is one of the most popular voice search applications. In directory assistance applications, users issue a spoken query to an automated system which returns phone number and address information for a business or an individual, based on a search conducted using the spoken query.
The characteristics of voice search technology pose some additional problems for spoken dialog systems. A voice search application differs from semantic analysis systems in that it does not require detailed semantic analysis to identify a semantic frame and its slots from an utterance. Similarly, voice search technology differs from call-routing types of applications because in call-routing types of applications, the number of routing destinations is relatively small. By contrast, the inventory of search space, or the number of classification destinations if the search is treated as a classification task, is enormous. Thus, the available data will seldom be sufficient to train a statistical model, such as a maximum entropy classifier or boosting algorithm.
Voice search also differs from speech recognition in that the vocabulary of a voice search system can be much bigger than a typical domain-specific speech recognition application—sometimes reaching millions of lexical entries. In addition, a voice search system must be robust in the face of relatively high automatic speech recognition error rates (sometimes reaching approximately 30-40 percent) and linguistic diversity in user's queries. In other words, users may not know (or would not say) the exact name of an entry in a directory. By way of example, a user looking for a department store may say “ACME Department Store” or “ACME's” rather than the technically correct name of the department store which is “ACME and Company.” For these and other reasons, employing a confidence measure in a spoken dialog system employing voice search technology has been very difficult.
The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.