1. Field of the Invention
The present invention relates to user interfaces for voice actuated services. In particular, the present invention relates to user interfaces specifically adapted to the spoken language of the target users. The present invention specifically provides both language-oriented user interfaces and generally applicable systems and methods for building such language-oriented user interfaces.
2. Description of the Related Art
A user interface is a component or tool of a computer system that enables a user to interact with the computer system, either to issue instructions controlling the operation of the system, enter data, examine results, or perform other operations in connection with the functions of the system. In effect, the user interface is the computer's “cockpit.” That is, the user interface presents information about the computer's operation to the user in an understandable form, and it enables the user to control the computer by converting the user's instructions into forms usable by the computer. Various types of user interfaces exist, such as text (or “command line”) interfaces, graphical user interfaces (“GUIs”), Dual Tone Multi-Frequency (DTMF) interfaces, and others.
“Voice activated” (VA) or “voice controlled” (VC) user interfaces are a promising alternative type of user interface that enable users to interact with the computer by spoken words. That is, rather than typing in text commands, pressing numbers on a telephone keypad, or “clicking” on a graphical icons and menu items, the user provides instructions and data to the computer merely by speaking appropriate words. The ability of a user interface to receive inputs by voice signals has clear advantages in many application areas where other means of input (keyboard, telephone keypad, mouse or other pointing device, etc.) are unavailable or unfamiliar to the user.
Unfortunately, voice activated user interfaces (“VA UIs”) have generally failed to provide the level of usability necessary to make such devices practical in most application areas. This failure has been due in part to inherent technical challenges, such as the difficulty of reliably converting spoken words into corresponding computer instructions. However, continuing advances in acoustic signal recognition (ASR) technologies have largely removed such obstacles. The persistent inadequacies of existing VA UIs therefore arise from design flaws in the UIs themselves, rather than lack of adequate implementing technology.
Currently, voice activated user interfaces (VA UIs) are designed and implemented in an ad hoc manner. Most developers overlay a voice-activated UI onto a dual-tone multiple frequency (DTMF) UI and perform after-the-after fact testing on the integrated unit. Tests of these system are therefore performed without consideration of the change in input modality (spoken versus DTMF keypresses) and for the new usability effects generated by the coupling between the various submodules of the system.
Trial and error is the most common approach for VA UI design and development. The vocabulary wordset for the service is often the literal translation of the English command words used for the task into the target language. Two typical prompting structures are (1) to list out all the options at once and wait for the subscriber to speak the choice (either at the end or by barging-in), or (2) to say the options one at a time, and provide a pause or yes/no question to signal the subscriber to make a choice. Textual (visual) UIs essentially follow the first approach, while DTMF UIs use the second approach. Explicit turn-taking is generally signalled by introducing a tone to indicate that the subscriber should speak.
However, to serve the needs of users effectively, a VA UI must have characteristics and must satisfy ease-of-use requirements different from those of a DTMF or visual/textual UI. The need for these differences arises because verbal dialogues are dynamic social interactions and differ across languages and cultures in ways that are not paralleled in visual or written interactions. To have any practical significance, therefore, a VA UI must flexibly accommodate different command words, tempos in which they are spoken, and ways in which turn-taking is signaled in the language in which the human-machine conversation is taking place. Put another way, designing a VA UI to be more than a technical curiosity requires more than simply adding (overlaying, substituting) command words to a DTMF service. All users, whether first-time, average, or experienced, must find the UI highly acceptable and easy to use.
On the other hand, it has been the accepted wisdom that present-day software technology is too rudimentary to make possible user interfaces that are actually easy to use. U.S. Pat. No. 5,748,841, issued May 5, 1998, to Morin et al., expresses this view as follows: “In one respect, the problem may be that even complex computer applications and computer programs do not provide the flexible input/output bandwidth that humans enjoy when interacting with other humans. Until that day arrives, the human user is relegated to the position of having to learn or acquire a precise knowledge of the language that the computer application can understand and a similar knowledge of what the computer application will and will not do in response. More precisely, the human user must acquire a knowledge of enough nuances of the application language to allow the user to communicate with the application in syntactically and semantically correct words or phrases.”
Thus, the state of the art in user interface technology has explicitly assumed that effective use of a practical user interface requires the user to learn the syntax and semantics that are employed by the user interface. There has existed an unmet need for a user interface adapted to the conventions of the user's spoken language. Heretofore this need has actually been considered to be unmeetable with existing software technology. This need has been particularly acute for voice activated user interfaces, because the conventions of spoken language vary much more widely between different communities than the conventions of written language. Furthermore, voice activated services may have greatest potential for growth among users with little computer experience, provided usable VAUIs that follow univeral spoken language principles become available.