Voice-enabled electronic devices such as smart phones, vehicle computing systems, wearable devices, tablet computers, and standalone voice-activated speakers are becoming more ubiquitous. A voice-enabled electronic device often includes, and/or is in network communication with, a “local” agent that facilitates various aspects of a user's voice-based interactions with the device. The local agent may be implemented via the voice-enabled electronic device itself and/or via one or more remote computing devices that are in network communication with the voice-enabled electronic device (e.g., computing device(s) in “the cloud”).
The local agent is “local” in the sense that it directly receives voice input (e.g., a streaming audio recording of a human's voice) provided via the voice-enabled electronic device, at least initially process the received voice input, and provides, for presentation (e.g., audible and/or graphical) via the electronic device, output that is responsive to the received voice input. For example, the local agent may initially process received voice input by performing at least voice to text (also known as speech to text) conversion that converts that voice input to text. Also, for example, the local agent may further provide output that is responsive to that voice input. For instance, the local agent itself may generate responsive content, and generate output that is based on the responsive content.