Computing devices can use models representing data relationships and patterns, such as functions, algorithms, systems, and the like, to process input and produce output that corresponds to the input in some way. In some implementations, a statistical model is used to generate a probability or likelihood that the input corresponds to a particular label, value, or the like. For example, computing tasks such as automatic speech recognition (“ASR”) and natural language understanding (“NLU”) use various statistical models to determine the probability that input, such as audio input of a user utterance or textual input generated from such audio input, corresponds to a particular word, sentence, actionable command, or some other type of information.
Conditional random fields (“CRFs”) are statistical models widely used in NLU applications, such as named entity recognition (“NER”) in which words and phrases are labeled as particular entities (e.g., people, places, organizations, etc.). CRFs typically include a set of states and a set of corresponding parameter weights. When used for NER, the states of a CRF model represent the possible entity labels that may be applied to the input. The parameter weights correspond to information extracted from the input, known as “features” or “feature vectors.” In some cases, a CRF model may include hundreds of thousands or even millions of weights. In order to reduce the size of large CRF models for storage or network transmission, the components of the CRFs can be compressed using various techniques, such as Lempel-Ziv coding (“gzip”) or Burrows-Wheeler transform coding (“bzip”).