1. Field of the Invention
The present invention relates to the field of speech recognition and, more particularly, to speech-based user interfaces.
2. Description of the Related Art
Conventional data processing systems frequently incorporate speech-based user interfaces to provide users with speech access to a corpus of data stored and managed by a data processing system. To adequately process user requests or queries, however, a speech recognition system must have the ability to recognize particular words that are specified within the corpus of data, and therefore, words that likely will be received as part of a user request.
Studies have shown, however, that within the context of a conversational speech recognition system, users tend to vary their replies based upon the particular prompt to which the users are responding. More particularly, users tend to repeat words from the prompt when responding. For example, if a user is asked “do you want to A, B, or C”, there is an increased likelihood that the user will say something like “I want to A.” Similarly, if the user is prompted “would you like to A, B, or C”, there is an increased likelihood that the user will respond with “I would like to A” or “I'd like to A.”
Taking another example, if a user is prompted to choose between two mutual funds, there is an increased likelihood that the user will pick one of the two offered choices. A mutual fund grammar or a language model is likely to be used in recognizing the user's response. While such mechanisms reflect the probabilities that particular words will be spoken by the user, the probabilities are determined through an empirical study of a text corpus with little or no concern over the particular questions asked to obtain user responses. Such mechanisms typically are used in a global sense within speech systems. In directed dialog systems like VoiceXML, the program that generates the prompt also returns the grammars used on the next turn to decode the prompt. However, in conversational systems that separate the recognition (i.e. statistical language models or grammars) from the prompt generation (i.e. automatically generated or hand crafted), it is desirable to have a method for adapting the speech recognition model being used according to the text of the prompt played to the user as well as any expected user responses.
With respect to grammar-based systems and development, the grammar developer may be different from the prompt developer thereby causing a disconnect with respect to incorporating feedback from the prompts to the grammars. But even in systems where the prompt and grammar are kept in sync, like VoiceXML, it takes extra development effort to generate the grammars customized to each prompt. Systems can incur extra run-time overhead, which could impact high call volume applications. In these cases, it would be preferable to have a single grammar that remains unchanged, and another having the probabilities of the rules to bias the grammar adjusted in favor of what the user is likely to say in response to the prompt.
Accordingly, it would be beneficial to bias probabilities of speech recognition systems in favor of predicted user responses.