Many existing operating systems and devices use voice input as a modality by which the user can control operation. One example is voice command systems, which map specific verbal commands to operations, for example to initiate dialing of a telephone number by speaking the person's name. Another example is Interactive Voice Response (IVR) systems, which allow people to access static information over the telephone, such as automated telephone service desks.
Many voice command and IVR systems are relatively narrow in scope and can only handle a predefined set of voice commands. In addition, their output is often drawn from a fixed set of responses.
An intelligent automated assistant, also referred to herein as a virtual assistant, is able to provide an improved interface between human and computer, including the processing of natural language input. Such an assistant allows users to interact with a device or system using natural language, in spoken and/or text forms. Such an assistant interprets user inputs, operationalizes the user's intent into tasks and parameters to those tasks, executes services to support those tasks, and produces output that is intelligible to the user.
Virtual assistants are capable of using general speech and natural language understanding technology to recognize a greater range of input, enabling generation of a dialog with the user. Some virtual assistants can generate output in a combination of modes, including verbal responses and written text, and can also provide a graphical user interface (GUI) that permits direct manipulation of on-screen elements. However, the user may not always be in a situation where he or she can (or wants to) take advantage of such visual output or direct manipulation interfaces. For example, the user may be driving or operating machinery, may have a sight disability, may simply have left the device that provides the virtual assistant in a pocket or out of reach, or may simply not want to pick the device up.
Any situation in which a user has limited or no ability (or desire) to read a screen or interact with a device via contact (including using a keyboard, mouse, touch screen, pointing device, and the like) is referred to herein as a “hands-free context”. For example, in situations where the user is attempting to operate a device while driving, as mentioned above, the user can hear audible output and respond using their voice, but for safety reasons should not read fine print, tap on menus, or enter text.
Hands-free contexts present special challenges to the builders of complex systems such as virtual assistants. Users demand full access to features of devices whether or not they are in a hands-free context. However, failure to account for particular limitations inherent in hands-free operation can result in situations that limit both the utility and the usability of a device or system, and can even compromise safety by causing a user to be distracted from a primary task such as operating a vehicle.