Spoken language processing systems include various modules and components for receiving speech input from a user, determining what the user said, and determining what the user meant. In some implementations, a spoken language processing system includes an automatic speech recognition (“ASR”) module that receives audio input of a user utterance and generates one or more likely transcriptions of the utterance. Spoken language processing systems may also include a natural language understanding (“NLU”) module that receives input, such as a transcription of a user utterance generated by the ASR module, and determines the meaning of the input in a way that can be acted upon, such as by a computer application.
An NLU module can identify particular words (e.g., named entities) in the transcription. Based on those named entities, the NLU module can identify a user's intent, and generate an output that may be used by an application to respond or otherwise perform an action regarding the user's intent. For example, a user of a mobile phone may issue a spoken command to play a particular song. Audio of the spoken command can be processed and transcribed by the ASR module, and the NLU module can determine the user's intent from the transcription (e.g., that the user wants to initiate playback of a song) and initiate playback of the song.