Language modeling is used in many language processing applications such as automatic speech recognition (ASR), natural language understanding (NLU), information retrieval, and machine translation. Language modeling may involve using labeled or annotated language data to train one or more language models to capture properties of a language. A language model may be trained to capture the likelihood that a particular sequence of language segments (e.g., a sequence of phonemes, a sequence of syllables, a sequence of words, a sequence of phrases, etc.) occurs in the language.