Statistical language modeling can be implemented in many prediction and recognition applications, such as, for example, speech or handwriting recognition and text input prediction. An effective language model can be desirable to constrain the underlying pattern analysis, guide the search through various text hypotheses, and/or contribute to the determination of the final outcome. Conventionally, the paradigm for statistical language modeling has been to convey the probability of occurrence in the language of all possible strings of n words. Given a vocabulary of interest for the expected domain of use, this can be achieved through a word n-gram model, which can be trained to provide the probability of a word given the n−1 previous word(s). Training word n-gram models can typically involve large machine-readable text databases, comprising representative documents in the expected domain. However, due to the finite size of such databases, many occurrences of n-word strings are seen infrequently, yielding unreliable parameter values for all but the smallest values of n. Further, in some applications, it can be cumbersome or impractical to gather a large enough amount of training data. In other applications, the size of the resulting model may exceed what can reasonably be deployed.