Some electronic devices, such as smartphones, tablet computers, and televisions include or are configured to utilize speech recognition capabilities that enable users to access functionality of the device via speech input. Audio input including speech is processed by an automatic speech recognition (ASR) system, which converts the input audio to recognized text. The recognized text may be interpreted by, for example, a natural language understanding (NLU) engine, to perform one or more actions that control some aspect of the device. For example, an NLU result may be provided to a virtual agent or virtual assistant application executing on the device to assist a user in performing functions such as searching for content on a network (e.g., the Internet) and interfacing with other applications by interpreting the NLU result.