Aspects of the present invention relate to voice automation systems. Other aspects of the present invention relate to automated spoken dialogue systems.
In a society that is becoming increasingly “information anywhere and anytime”, information services emerge everyday to provide different types of information to users around the clock. Examples of such services include weather information and movie information. In the past, information services may be provided through operators or via selection buttons. For example, a user may make a requesting call to a toll free phone number corresponding to a service. If an operator picks up the requesting call, the user may speak to the operator to indicate desired information. The operator may then select the desired information and play back to the user over the phone. If a recording picks up the requesting call, the user may indicate desired information through selection buttons according to the instructions from the recording.
To run a cost effective information service business, companies have made effort to automate the service process. For example, in directory assistance services provided by various phone companies, semi-automated systems have been deployed that maximizes productivity in providing requested information (e.g., phone numbers). Some rail systems in Europe have deployed automated ticket reservation systems that allow customers to reserve train tickets over phone via voice. As another example, “tellme” service in the U.S.A. (1-800-tellme) offers free information services across many categories (e.g., weather, movies, stocks, etc.) over phone via automated voice responses. Such effort has so far decreased the required number of skilled workers and hence the size of the operating facilities, saving service offering company millions of dollars each year.
The automation efforts mentioned above utilize automated speech recognition and language understanding technologies. Spoken dialogue systems are developed based on such technologies to automate service and other systems. A dialog system usually serves as a front end of a service, interacting with a user to understand what is being requested, activating one or more back end systems to retrieve the requested information, and generating voice responses.
Service systems that deploy automated voice based front-end solutions may not always function properly. Such imperfection may be due to various environmental and technological reasons. For example, a user may speak over a phone in a very noisy environment that yields speech data of poor quality. A speech recognition system may not be able to understand speech with an unknown foreign accent. Another related reason for an automated voice based front end to make mistakes is that it is presented with an unusual speech pattern. For example, when a user gets annoyed (e.g., by the previous mistake the system made or by a tedious confirmation process), the user may respond with anger by raising the voice which may corresponds to an unusually high pitch. A user may have nasal congestion due to a cold that may significantly change the acoustic properties of speech.
Dialogue systems can be designed to provide a certain amount of tolerance to imperfections. Such tolerance may be achieved using different strategies. For example, a dialogue system may employ “confirmation” strategy. A dialogue system may also navigate using prompts. While these strategies may reduce the chance of making mistakes, the “confirmation” strategy may be tedious to a user and it does not always work. The strategy of “navigate using prompts” provides little flexibility for users to browse at will.
A fair number of users fail to make use of automated voice based services. In addition to the above mentioned technological and environmental reasons, one important contributing factor may also be that these users simply give up when an automated dialogue system makes mistakes without realization or being apologetic. It is particularly true when a user raises his voice to express dissatisfaction that only further triggers a dialogue system to either repeat the same mistake or make even more mistakes.