Speech recognition generally involves two kinds of pattern recognition. The first kind of pattern recognition using an acoustic model to identify sounds and sequences of sounds that may be words or parts of words. The second kind of pattern recognition uses a language model to identify sequences of words. The language model provides a linguistically based score representing the probability of a word given a word history. An n-gram model means the word history is n words long. Both models typically are probabilistic and are generated from a training set of valid utterances. Other than this similarity, however, both of these models typically are designed and implemented, and generally treated, as independent from each other, except they are used in an interleaved fashion to recognize words in an utterance. Such techniques are described generally in F. Jelinek, Statistical Methods for Speech Recognition, MIT Press, 1997.
There are several different classes of language models. One class of language models is exponential language models, such as “model M” described in “Performance Prediction for Exponential Language Models,” by Stanley Chen, in the proceedings of NAACL-HLT, 2009. In an exponential language model, word n-gram probabilities are modeled with a log-linear model, and word-class information is used in the definition of the features. Assuming an n-gram model on words w, and a lambda λ for each word history in the training data, subject to length and frequency restrictions, the form of the basic exponential language model is (Equation (0):
      P    ⁡          (                        w          i                |                              w                          i              -              n              +              1                                ⁢                                          ⁢          …          ⁢                                          ⁢                      w                          i              -              1                                          )        =                    exp        ⁡                  (                                    λ                                                w                                                            i                      -                      n                      +                      1                                        ⁢                                                                                                                ⁢                …                ⁢                                                                  ⁢                                  w                                      i                    -                    1                                                  ⁢                                  w                  i                                                      +            …            +                          λ                                                w                                      i                    -                    1                                                  ⁢                                  w                  i                                                      +                          λ                              w                i                                              )                                      ∑                      w            ′                          ⁢                  exp          ⁡                      (                                          λ                                                      w                                                                  i                        -                        n                        +                        1                                            ⁢                                                                                                                            ⁢                  …                  ⁢                                                                          ⁢                                      w                                          i                      -                      1                                                        ⁢                                      w                    ′                                                              +              …              +                              λ                                                      w                                          i                      -                      1                                                        ⁢                                      w                    ′                                                              +                              λ                                  w                  ′                                                      )                                .  
In this model, the presence of an n-gram sequence is a feature, and there is a lambda for each feature.
Typically, one or more fixed language models are built and used unchanged for an entire data set, for both training and classification. If a fixed set of language models is used, interpolation is performed on one or more of the large language models in the set. To extend such a model to accommodate information specific to a user, it is common to interpolate a user-specific n-gram language model with a generic n-gram language model. However, this solution is computationally inefficient for large numbers of users.