Mobile devices provide a variety of functions—Internet access, camera image capture and image storage, general data storage, contact management, and the like. Many mobile devices include software for responding to an utterance of a user of the device. Some utterances can include instructions to the device to call a phone number, text a phone number, operate an application, or search for information on the mobile device or the Internet. The devices employ speech-to-text, or automated speech recognition (ASR), processes to recognize a voice input from the user. Such applications are generally referred to as “assistants.”
Assistant applications can generally relate certain utterances to commands and arguments. For example, using speech recognition techniques, an assistant can convert the utterance “show me some pictures of Julia,” to text. Then, for example, by use of a text command model, the assistant can interpret the text “show me some pictures Julia” to invoke a command to search for images stored on the device, or, alternatively, stored in a cloud account associated with the user. The word “Julia” is resolved to a label tag, and thus the search is directed to images tagged with the label “Julia.” A variety of other, more complex operations can also be facilitated.