This description relates to voice application platforms.
Voice application platforms provide services to voice assistants and voice assistant devices to enable them to listen to and respond to end users' speech. The responses can be spoken or presented as text, images, audio, and video (items of content). In some cases the responses involve actions such as turning off an appliance.
Voice assistants, such as Apple's Siri, Amazon's Alexa, Microsoft's Cortana, and Google's Assistant, are accessed from servers by proprietary voice assistant devices such as Amazon Echo and Apple HomePod, or sometimes on generic workstations and mobile devices.
Voice assistant devices typically have microphones, speakers, processors, memory, communication facilities, and other hardware and software. A voice assistant device can detect and process human speech to derive information representing an end user's request, express that information as a request message (which is sometimes called an intent or contains an intent) in accordance with a predefined protocol, and communicate the request message through a communication network to a server.
At the server, a voice application receives and processes the request message and determines an appropriate response. The response is incorporated into a response message expressed in accordance with a predefined protocol. The response message is sent through the communication network to the voice assistant device. The voice assistant interprets the response message and speaks or presents (or takes actions specified by) the response. The work of the voice application is supported by an infrastructure of operating systems and other processes running on the server.
The services provided by the server to the client voice assistant devices to enable their interactions with end users are sometimes called voice assistant services (which are sometimes also called or include skills, actions, or voice applications).
Interaction between an end user and a voice assistant can include a series of requests and responses. In some cases, requests are questions posed by end users and the responses are answers to the questions.
Typically, the server, the voice assistant devices, the voice assistants, the voice assistant services, the predefined protocols, and basic voice applications are designed together as part of a proprietary voice assistant framework. To enable third parties—such as brands that want to engage with the end users through the voice assistants—to create their own voice applications, the frameworks provide proprietary APIs.