1. Field of the Invention
The present invention relates to user interfaces for voice actuated services. In particular, the present invention relates to user interfaces specifically adapted to the spoken language of the target users. The present invention specifically provides both language-oriented user interfaces and generally applicable systems and methods for building such language-oriented user interfaces.
2. Description of the Related Art
A user interface is a component or tool of a computer system that enables a user to interact with the computer system, either to issue instructions controlling the operation of the system, enter data, examine results, or perform other operations in connection with the functions of the system. In effect, the user interface is the computer""s xe2x80x9ccockpit.xe2x80x9d That is, the user interface presents information about the computer""s operation to the user in an understandable form, and it enables the user to control the computer by converting the user""s instructions into forms usable by the computer. Various types of user interfaces exist, such as text (or xe2x80x9ccommand linexe2x80x9d) interfaces, graphical user interfaces (xe2x80x9cGUIsxe2x80x9d), Dual Tone Multi-Frequency (DTMF) interfaces, and others.
xe2x80x9cVoice activatedxe2x80x9d (VA) or xe2x80x9cvoice controlledxe2x80x9d (VC) user interfaces are a promising alternative type of user interface that enable users to interact with the computer by spoken words. That is, rather than typing in text commands, pressing numbers on a telephone keypad, or xe2x80x9cclickingxe2x80x9d on a graphical icons and menu items, the user provides instructions and data to the computer merely by speaking appropriate words. The ability of a user interface to receive inputs by voice signals has clear advantages in many application areas where other means of input (keyboard, telephone keypad, mouse or other pointing device, etc.) are unavailable or unfamiliar to the user.
Unfortunately, voice activated user interfaces (xe2x80x9cVA UIsxe2x80x9d) have generally failed to provide the level of usability necessary to make such devices practical in most application areas. This failure has been due in part to inherent technical challenges, such as the difficulty of reliably converting spoken words into corresponding computer instructions. However, continuing advances in acoustic signal recognition (ASR) technologies have largely removed such obstacles. The persistent inadequacies of existing VA UIs therefore arise from design flaws in the UIs themselves, rather than lack of adequate implementing technology.
Currently, voice activated user interfaces (VA UIs) are designed and implemented in an ad hoc manner. Most developers overlay a voice-activated UI onto a dual-tone multiple frequency (DTMF) UI and perform after-the-after fact testing on the integrated unit. Tests of these system are therefore performed without consideration of the change in input modality (spoken versus DTMF keypresses) and for the new usability effects generated by the coupling between the various submodules of the system.
Trial and error is the most common approach for VA UI design and development. The vocabulary wordset for the service is often the literal translation of the English command words used for the task into the target language. Two typical prompting structures are (1) to list out all the options at once and wait for the subscriber to speak the choice (either at the end or by barging-in), or (2) to say the options one at a time, and provide a pause or yes/no question to signal the subscriber to make a choice. Textual (visual) UIs essentially follow the first approach, while DTMF UIs use the second approach. Explicit turn-taking is generally signalled by introducing a tone to indicate that the subscriber should speak.
However, to serve the needs of users effectively, a VA UI must have characteristics and must satisfy ease-of-use requirements different from those of a DTMF or visual/textual UI. The need for these differences arises because verbal dialogues are dynamic social interactions and differ across languages and cultures in ways that are not paralleled in visual or written interactions. To have any practical significance, therefore, a VA UI must flexibly accommodate different command words, tempos in which they are spoken, and ways in which turn-taking is signaled in the language in which the human-machine conversation is taking place. Put another way, designing a VA UI to be more than a technical curiosity requires more than simply adding (overlaying, substituting) command words to a DTMF service. All users, whether first-time, average, or experienced, must find the UI highly acceptable and easy to use.
On the other hand, it has been the accepted wisdom that present-day software technology is too rudimentary to make possible user interfaces that are actually easy to use. U.S. Pat. No. 5,748,841, issued May 5, 1998, to Morin et al., expresses this view as follows: xe2x80x9cIn one respect, the problem may be that even complex computer applications and computer programs do not provide the flexible input/output bandwidth that humans enjoy when interacting with other humans. Until that day arrives, the human user is relegated to the position of having to learn or acquire a precise knowledge of the language that the computer application can understand and a similar knowledge of what the computer application will and will not do in response. More precisely, the human user must acquire a knowledge of enough nuances of the application language to allow the user to communicate with the application in syntactically and semantically correct words or phrases.xe2x80x9d
Thus, the state of the art in user interface technology has explicitly assumed that effective use of a practical user interface requires the user to learn the syntax and semantics that are employed by the user interface. There has existed an unmet need for a user interface adapted to the conventions of the user""s spoken language. Heretofore this need has actually been considered to be unmeetable with existing software technology. This need has been particularly acute for voice activated user interfaces, because the conventions of spoken language vary much more widely between different communities than the conventions of written language. Furthermore, voice activated services may have greatest potential for growth among users with little computer experience, provided usable VAUIs that follow univeral spoken language principles become available.
It is an object of the present invention to provide a method of designing language-oriented user interfaces for voice activated services.
The present invention provides, in a first aspect, a method for designing a voice activated user interface, the method comprising separately selecting a vocabulary set and a prompting syntax for the user interface based on results of first testing with subjects from a target community. The method further comprises jointly optimizing the vocabulary set and the prompting syntax based on results of second testing with subjects from the target community.
In a second aspect, the invention provides a method for selecting a vocabulary set for a voice activated user interface. The method of this aspect comprises collecting responses to task-oriented questions eliciting commonly used names for tasks and task-related items, and selecting a plurality of responses from the collected responses based on frequency of occurrence in the collected responses.
In a third aspect, the invention provides a computer system and computer software providing a service through a voice activated user interface. The computer system comprises a storage and a processor. The storage has a vocabulary of command words stored therein, each command word being selected from responses to questions posed to members of a test group. The processor interprets a spoken response based on the stored command words. The computer software comprises instructions to perform the corresponding operations.
In a fourth aspect, the invention provides a method for defining a prompting syntax for a voice actuated user interface. The method of this fourth aspect comprises identifying an initial value for each of one or more syntax parameters from samples of dialogue in a conversational language of a target community. The method further comprises specifying an initial temporal syntax for the user interface based on the one or more identified initial values.
In a sixth aspect, the invention provides a method for optimizing a prompting syntax of a voice actuated user interface, the method comprising testing performance of tasks by subjects from a target community using a the interface implemented with a command vocabulary and a temporal syntax each selected for the target community. The method of this aspect further comprises modifying the temporal syntax based on results of the testing.
In a seventh aspect, the invention provides a method for defining a prompting syntax for a voice activated user interface, the method comprising specifying an initial temporal syntax for the user interface based on initial syntax parameter values identified through dialogue analysis. The method of this aspect also comprises modifying the initial temporal syntax based on results of testing user performance with the user interface using a selected command vocabulary with the initial temporal syntax.
In an eighth aspect, the invention provides a method for optimizing a voice activated user interface, the method comprising configuring the user interface with a vocabulary of command words including at least one word indicating a corresponding task and selected from plural words for the task based on frequency of use. The method of this aspect also comprises changing at least one of a command and a syntax parameter of the user interface based on results of testing the user interface with speakers of a target language.
In a ninth aspect, the invention provides a method for adaptive error handling in a voice activated user interface. The method comprises detecting that an error has occurred in a dialogue between the user and the user interface based on a change in behavior of the user. The method further comprises reprompting the user when the error is an omission error, and returning to a previous menu state responsive to a correction command by the user when the error is a commission error.
In a tenth aspect, the invention provides a method for adaptive error handling in a voice activated user interface. The method of this aspect comprises detecting that an error has occurred in a dialogue with the user interface following a prompt delivered according to a first prompting structure, and reprompting the user according to a second prompting structure when a count of errors exceeds a predetermined value.
In an eleventh aspect, the invention provides a method for adaptive error handling in a voice activated user interface, the method comprising selecting an error prompt level based on an accumulated number of user errors when a user error occurs in a dialogue between the user interface and a user. The method of this aspect further comprises reprompting the user according to the selected error prompt level.
In a twelfth aspect, the invention provides a computer system and computer software providing a service to a user through a voice activated user interface. The computer system comprises a storage and a processor. The storage stores a menu of commands usable by the user in a dialogue between the user and the user interface. The processor detects an error in the dialogue based on a change in behavior of the user, reprompts the user when the error is an omission error, and returns to a previous menu state responsive to a correction command when the error is a commission error.
In a thirteenth aspect, the invention provides a computer system and software providing a service to a user through a voice activated user interface, the computer system comprising a storage and a processor. The storage stores a menu of commands usable by the user in a dialogue between the user and the user interface. The processor prompts a command selection by the user according to a first prompting style, detects an error in the dialogue when the error occurs, and prompts a command selection by the user according to a second prompting style when a count of errors by the user during the dialogue exceeds a predetermined value.
In a fourteenth aspect, the invention provides a method for prompting a user of a voice activated user interface. The method of this aspect comprises pausing for a first predetermined interval after presentation of a label identifying a current menu state of the user interface. The method further comprises presenting to the user a command option for the current menu state only when a command is not received from the user during the predetermined interval.
In a fifteenth aspect, the invention provides a method for developing an automatic speech recognition (ASR) vocabulary for a voice activated service. The method comprises posing, to at least one respondent, a hypothetical task to be performed and asking each of the at least one respondent for a word that the respondent would use to command the hypothetical task to be performed. The method of this aspect further comprises receiving, from each of the at least one respondent, a command word developing a list of command words from the received command word, and rejecting the received command word, if the received command word is acoustically similar to another word in the list of command words.
Additional objects and advantages of the invention will be set forth in part in the following description and, in part, will be obvious therefrom or may be learned by practice of the invention.