Natural language processing systems include various modules and components for receiving input from a user (e.g., audio, text, etc.) and determining what the user meant. In some implementations, a natural language processing system includes an automatic speech recognition (“ASR”) module that receives audio input of a user utterance and generates one or more likely transcriptions of the utterance. Automatic speech recognition modules typically include an acoustic model and a language model. The acoustic model is used to generate hypotheses regarding which subword units (e.g. phonemes or triphones) correspond to an utterance based on the acoustic features of the utterance. The language model is used to determine the most likely transcription of the utterance based on the hypotheses generated using the acoustic model and lexical features of the language in which the utterance is spoken.
Automatic speech recognition systems may implement speech recognition models in different ways to obtain different benefits. For example, a language model may be implemented in as a finite state transducer (“FST”). An FST-based language model is a directed graph with nodes and directed arcs. The nodes correspond to decoding states, and the directed arcs correspond to weights for transitioning from one state to another (e.g., recognizing an additional subword unit, word, n-gram, etc.). Each path through the graph corresponds to an utterance. Some language models include a large number of different states (e.g., millions), and may have a similar number of arcs (or more).