Speech processing systems include various modules and components for receiving spoken input from a user and determining what the user meant. In some implementations, a speech processing system includes an automatic speech recognition (“ASR”) module that receives audio input of a user utterance and generates one or more likely transcriptions of the utterance. ASR modules typically use an acoustic model and a language model. The acoustic model is used to generate hypotheses regarding which words or subword units (e.g., phonemes) correspond to an utterance based on the acoustic features of the utterance. The language model is used to determine which of the hypotheses generated using the acoustic model is the most likely transcription of the utterance based on lexical features of the language in which the utterance is spoken.
Speech processing systems may also include a natural language understanding (“NLU”) module that receives textual input, such as a transcription of a user utterance, and determines the meaning of the text in a way that can be acted upon, such as by a computer application. For example, an NLU module may be used to determine the meaning of text generated by an ASR module using a statistical language model. The NLU module can then determine the user's intent from the ASR output and provide the intent to some downstream process that performs some task responsive to the determined intent of the user (e.g., generate a command to initiate the phone call, initiate playback of requested music, provide requested information, etc.).
As the intents which may be recognized are subject to change such as according to new NLU modeling, new system commands, etc., a need exists to provide suggested commands (e.g., intents) during automatic speech recognition operations such as search rather than waiting for completion of the automatic speech recognition.