Language models and acoustic models may be used to facilitate speech recognition. For example, an acoustic model may be used to identify phonemes or other subword units present in an utterance. A language model may then be used to convert the phonemes or other sound units identified by the acoustic model into words, phrases, and the like. Language models may be generated by analyzing a large corpus of text to determine the frequency with which a sequence of n words (or “n-gram”) appears in the text. Probabilities for an n-gram in the language model may be computed by determining the conditional probability of the final word of the n-gram appearing in the corpus given that the previous words of the n-gram have been found. This probability can be used to identify audio inputs with the use of a speech recognizer. For example, a speech recognizer may receive an audio input that may correspond to two or more possible word sequences. The language model may be used to determine the probabilities of each of the word sequences that correspond to the audio input, and the audio input may be recognized as being the word sequence with the highest probability.
In training models for speech recognition, a maximum likelihood criterion may be applied. For example, language models may be trained to optimize a criterion based on maximum likelihood. One drawback of this approach, among others, is that a maximum likelihood criterion may not minimize the probability of word errors in speech recognition.
Additionally, in some current approaches, a language model may be pruned so that fewer n-grams are used in recognizing speech. In one current approach, a language model is pruned by removing all n-grams whose probabilities are lower than a threshold. In another current approach, a language model is pruned based on relative entropy, so that a pruned language model has a relatively similar distribution of probabilities to a base language model. One drawback of these approaches, among others, is that for a given target size of a language model, these approaches may sacrifice too much accuracy and increase the probability of word errors in speech recognition to unacceptable levels.