1. Technical Field
The present invention relates to data processing and, in particular, to computer voice interaction. Still more particularly, the present invention provides one-step correction of voice interaction.
2. Description of Related Art
An interactive voice response (IVR) system is an automated telephone information system that speaks to the caller with a combination of fixed voice menus and realtime data from databases. The caller responds by speaking words or short phrases. Applications may include, for example, bank-by-phone, flight-scheduling information, and automated order entry and tracking. IVR systems allow callers to get needed information without the expense of employing call centers with human operators. IVR systems may also used as a front end to call centers in order to offload as many calls as possible to costly human agents.
As an example, an IVR system may reside in a data processing systems equipped with special expansion cards that contain digital signal processor (DSP) chips. These specialized processors may connect to a telephone system that switches telephone calls. IVR systems may also be networked, although IVR systems may be present in a stand-alone data processing system or as a client application on an end user machine.
One problem associated with speech applications is correction. Correction is the process of identifying, locating, and replacing incorrect or misrecognized values returned by the speech recognizer. For example, a user is prompted to speak some piece of information by a prompt player device, which is driven by a dialog manager. The user speaks into a microphone device and the speech recognizer receives the speech signal, decodes it, and extracts the spoken piece of information, which is then sent to the dialog manager. The dialog manager interprets the information received from the speech recognizer and instructs the prompt playing device to play the next prompt. The next prompt could be, for example, a request for some other piece of information, a confirmation, or correction or reentry of the current piece of information.
In an IVR, the input of the user is prone to being misrecognized. This may be due to the nature of the speech recognition device, for example. Typical speech applications take these mistakes into consideration and make use of mechanisms that indicate when and where a possible misrecognition has taken place. Correction mechanisms are then used to correct and possibly re-capture the piece of information until the system is confident that a correct value is received from the speech recognition engine.
Typically, correction mechanisms are activated when the dialog manager has poor confidence in information received from the speech recognizer. A typical process implemented in many systems involves two or more steps. The first step is to identify the intention to correct as well as the attribute to be changed. The second step is to capture a new value for the attribute. For example, consider the following interaction:
System:Where would you like to make the payment?User:In Austin--correction turn 1--System:Was that Boston?User:No.--correction turn 2--System:Let's try again. Where would you like to make the payment?User:Austin--next information prompt--System:Got it! And how much would you like to pay?User:Two hundred dollarsIn the above example, the user has to go through two additional interaction turns compared to a case where the system does not make any mistake.