Speech processing systems include various modules and components for receiving spoken input from a user and determining what the user meant. In some implementations, a speech processing system includes an automatic speech recognition (“ASR”) module that receives audio input of a user utterance and generates one or more likely transcriptions (e.g., recognition results generated by the ASR module) of the utterance. ASR modules typically use an acoustic model and a language model. The acoustic model is used to generate hypotheses regarding which words or subword units (e.g., phonemes) correspond to an utterance based on the acoustic features of the utterance. The language model is used to determine which of the hypotheses generated using the acoustic model is the most likely transcription of the utterance based on lexical features of the language in which the utterance is spoken.
Speech processing systems can use the transcriptions generated by the ASR module in various ways. For example, some speech processing systems simply fill in a form field or otherwise save the transcription of the user utterance. As another example, some speech processing systems may include modules or components that accept a transcription of a user utterance and determine the meaning of the utterance in a way that can be acted upon, such as by a computer application. One example of such a module is a natural language understanding (“NLU”) model. To facilitate the interpretation of the utterance, the NLU module can classify elements of textual input in various pre-defined categories using a process known as named entity recognition. Examples of categories or types of named entities include the names of persons, organizations, locations, expressions of times, quantities, monetary values, and the like.