The discussion below is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.
Language models are used in a variety of applications including noisy channel applications such as natural language processing, spell checking, and the like. In natural language applications, a speech recognizer typically works by combining acoustic evidence (channel model) with expectations about what the user is likely to say (language model). One common form of language models is referred to as a tri-gram.
In general, a n-gram is a subsequence of n tokens (words). A tri-gram is a subsequence of 3 tokens. For example, from the phrase “to be or not to be”, 8 tri-grams can be generated: “$ $ to”, “$ to be”, “to be or”, “be or not”, “or not to”, “not to be,” “to be $” and “be $ $,” where the input string is padded with two special tokens denoted at: “$.” Statistics can be applied to such n-grams to estimate a likelihood that a user intended a particular input.
Though a billion words of text used to be considered large, training sets for speech recognition routinely train on ten billion words of text. In general, large language models work well (meaning they have low entropy); however, memory capacity is often limited, especially in mobile devices such as cell phones, personal digital assistants (PDAs), electronic planners, and the like. One technique for addressing the memory situation involves trimming the language model, by removing infrequently used words and uncommon variants. However, removal of such terms reduces the overall effectiveness of the language model, leading to more semantic errors due to inability to match input to words in the trimmed model.