Virtual personal assistants are artificial intelligence systems that perform tasks on a computing device in response to natural-language requests from a user. For example, a virtual personal assistant may handle calendaring, reminders, and messaging tasks for the user. To interact with the virtual personal assistant, the user typically enters a pre-defined input sequence on the computing device, for example pressing a dedicated hardware button or speaking a predefined code word. The user may enter natural-language requests through conventional text input or through speech recognition.
To further facilitate natural interaction, many virtual personal assistants display a humanlike character, also known as an avatar, to serve as a main point of interaction with the user. The avatar may occupy or obscure a significant portion of the display of the computing device. Further, the avatar may interfere with use of other applications on the computing device, particularly when the user did not intend to activate the avatar. Even when displaying a humanlike avatar, typical systems may not fully model natural human interaction, and instead may require conventional human-computer interactions such as button presses, mouse clicks, or the like.
Speech recognition systems convert spoken utterances of the user into computer-readable representations of text. Typical speech recognition systems attempt to determine a single most-likely speech recognition result for a given audio input. Such systems may filter out noise or otherwise attempt to enhance the audio input signal in order to improve speech recognition results. Some systems may provide a small number of alternative results; however, these results are typically only slight variations on each other. Typical speech recognition engines may be implemented as components of a local computing device, or as services provided by a server computing device.