1. Technical Field
The present disclosure relates to voice applications and, more specifically, to systems and methods for automated confirmation and disambiguation modules in voice applications.
2. Discussion of Related Art
Voice applications are computer programs that communicate with a user by spoken language. For example, a user may be provided with voice prompts and may then speak instructions that are interpreted by the voice application. Voice applications find a wide variety of uses such as automated telephone-based customer service applications. Voice applications may be developed using existing standards such as VoiceXML (VXML), which is a standard for providing automated interactive voice dialogue. VoiceXML may be used in a manner similar to how HTML documents may be viewed on a web browser; VoiceXML documents may interpreted by a voice browser. A user may be in contact with the voice browser via a telephone system. However, the voice browser may also be local to the user. For example, a voice browser installed on a smartphone or tablet computer may be used to interact with a user.
In addition to VoiceXML, voice applications may be modeled using business process modeling notation (BPMN). BPMN establishes standards for modeling business process and web services. As voice applications may be an important part of web services, BPMN may be particularly useful in modeling voice applications.
An important part of voice applications is automated speech recognition (ASR). ASR techniques are used to interpret received user speech into computer-understandable instructions. As there may be many different ways to express the same instruction in spoken language, many different pronunciations for the same word, and many different words that sound very similar, if not identical to each other, ASR techniques may have difficulty in distinguishing between multiple possible options for what a user may have said. Confirmation and disambiguation are approaches to help ASR techniques in pinning down spoken words to particular commands after the computer has determined that the spoken words may be one of a number of different commands.
In confirmation, the voice application may ask the user to confirm that a particular command was actually spoken. A typical confirmation question may sound like, “I think you said X, is that correct?” Confirmation may be well suited for cases in which the computer has determined that spoken language is most likely a particular command but certainty does not exceed a predetermined threshold and/or there may be other close options available.
In disambiguation, the computer may have trouble distinguishing between two or more possible options and the computer may request the user to select between the possible options. Disambiguation may be particularly useful where multiple options sound similar and thus disambiguation may ask the user to select between the similar sounding options, for example, by prompting the user to answer using language that is easier to interpret. For example, where the user has spoken the name of a city and the computer is not sure if the city name is “Austin” or “Boston,” disambiguation may be performed by asking the user, “Did you say Austin, Tex. or Boston, Mass.?” thereby prompting the user to say the name of the state along with the city so that the computer may accurately determine the command.
Where there is a large number of potential commands, disambiguation may serve to reduce the field of potential commands and ultimately arrive at a particular command through the course of a set of questions. In this case, questions may be carefully determined based on a range of expected answers.
As expected answers generally have a lot to do with the potential commands, confirmation and disambiguation logic is generally programmed on an application-by-application basis. In particular, the logic for performing confirmation and disambiguation, along with the possible answers, is generally hardcoded directly into the process flow for the voice application. By utilizing this approach, knowledge of expected answers may more effectively be used to perform confirmation and disambiguation. However, this approach also makes programming voice applications more difficult by forcing developers to repeatedly program confirmation and disambiguation.