Natural language processing systems include various modules and components for receiving input from a user (e.g., audio, text, etc.) and determining what the user meant. In some implementations, a natural language processing system includes an automatic speech recognition (“ASR”) module that receives audio input of a user utterance and generates one or more likely transcriptions of the utterance. Automatic speech recognition modules typically include an acoustic model and a language model. The acoustic model is used to generate hypotheses regarding which subword units (e.g. phonemes or triphones) correspond to an utterance based on the acoustic features of the utterance. The language model is used to determine the most likely transcription of the utterance based on the hypotheses generated using the acoustic model and lexical features of the language in which the utterance is spoken.
Automatic speech recognition systems may use different types of language models to obtain different benefits. For example, a grammar-only language model includes a number of pre-defined combinations of words, such as sentences. User utterances that correspond to one of the sentences in the grammar-only language model may be quickly and accurately recognized due to the limited search space. As another example, a statistical language model may include a large vocabulary, and the words may be recognized in any combination. User utterances that include sequences of words not known ahead of time may still be accurately recognized due to the expansive search space.