1. Field of the Invention
The present invention relates to the field of speech processing, and, more particularly, to partially filling mixed-initiative forms from utterances having confidence scores below a threshold based upon word-level confidence data.
2. Description of the Related Art
VoiceXML documents define applications as a set of named dialog states. The user is always in one dialog state at any time. Voice XML dialogs include forms and menus. A form defines an interaction that collects values for each of a set of fields in the form. Each field can specify a prompt, the expected input, and evaluation rules. Additionally, each dialog state has one or more grammars associated with it that are used to describe the expected user input which includes spoken input and/or touch-tone (DTMF) key presses.
Two means are commonly used to gather data to fill multiple form items. One means to gather data assigns a specific grammar to each form item and utilizes a Form Interpretation Algorithm (FIA) to visit each form item until each one is filled with data provided by a user. The second means collects multiple pieces of information in a single dialog state. This type of form is a mixed-initiative form associated with a form-level grammar.
Since a form-level grammar supports filling multiple fields, it is more complex and the associated speech utterances are longer than utterances associated with filling a single field. Longer utterances have a relatively high probability of returning NO_MATCH results and in being incorrectly recognized by a speech recognition engine. Each recognized utterance is typically associated with an utterance-level (e.g., a form-level or phrase-level) confidence score. When this utterance-level confidence score is below a designated confidence threshold, a user will typically be re-prompted for the full utterance in hopes that a new utterance will result in a higher confidence score. Being forced to repeat a complete utterance can be time consuming and frustrating to user.