Speech processing systems include various modules and components for receiving spoken input from a user and determining what the user meant. In some implementations, a speech processing system includes an automatic speech recognition (“ASR”) module that receives audio input of a user utterance and generates one or more likely transcriptions of the utterance. ASR modules typically use an acoustic model and a language model. The acoustic model is used to generate hypotheses regarding which words or subword units (e.g., phonemes) correspond to an utterance based on the acoustic features of the utterance. The language model is used to determine which of the hypotheses generated using the acoustic model is the most likely transcription of the utterance based on lexical features of the language in which the utterance is spoken.
Speech processing systems may also include a natural language understanding (“NLU”) module that receives textual input, such as a transcription of a user utterance, and determines the meaning of the text in a way that can be acted upon, such as by a computer application. For example, an NLU module may be used to determine the meaning of text generated by an ASR module using a statistical language model. The NLU module can then determine the user's intent from the ASR output and provide the intent to some downstream process that performs some task responsive to the determined intent of the user (e.g., generate a command to initiate the phone call, initiate playback of requested music, provide requested information, etc.).
Some speech processing systems are configured to engage in multi-turn dialog interactions with users. For example, a user may wish to initiate a certain process or task, but may not provide all necessary information. In this case, the speech processing system can prompt the user for the missing necessary information. As another example, user may wish to receive information from the system. The speech processing system can provide the requested information and allow the user to initiate subsequent processes based on the provided information.