As robotics and internet-of-things (IOT) applications grow and become more pervasive, human-machine interaction necessarily grows as well. Increasingly, this interaction involves audio or oral interactions between a human user and an artificially intelligent device; for example, oral interaction with an intelligent personal assistant located in a smart speaker device. Generally, this interaction involves capturing the audio signal of the user locally, sending this audio signal to a cloud computing resource, utilizing a machine learning technique to digitally parse and identify words and phrases in the audio signal, using a machine learning technique to build a response to the sequence of words, and transmitting this to the human user and rendering it. In some cases, in order to allow users to add their own concepts to the response system, hooks can be programmed for application specific responses.
The above determined response can, in some cases, take the form of a sequence of words or actions to be sent back to the local environment. Actions can be, for example, to control IOT devices or to control an autonomous system. Where the response is a sequence of words, a response can be delivered to the user, often via computer-generated speech. In this case, the cloud computing resource can be used to convert the words to an audio file via a computer-generated speech technique, the audio file can be sent to the device local to the user, and the audio file can be played for the user.
These applications are generally limited in that they only involve audio or text interactions or interfaces, or IOT action responses.