1. Field of Invention
The present invention relates generally to natural language processing and more specifically to natural language processing applications that use a language model.
2. Description of Related Art
In machine translation, a language model is often used to estimate the correctness of a translated text in a target language. In statistical machine translation, the language model typically assigns a high probability to a grammatically correct sentence and a low probability to a grammatically incorrect sentence. For example, a language model will assign a high probability to the sequence, “Mary has a little lamb” and a low probability to the sequence “Mary little lamb a has.”
Typically, language models include a stochastic generative process that produces structures (e.g., sentences) that are either well-formed or ill-formed. The stochastic generative process typically generates the structures in a sequence of steps where each step is associated with a probability. The probability of the entire structure is calculated by multiplying the probability of all of the steps.
For example, a language model can be generated by choosing a first word in the sentence, choosing other words in the sentence and choosing a last word in the sentence. The first word in the sentence is associated with a probability of p(w1|BOS) where BOS is the Beginning of the Sentence. The other words in the sentence are associated with a probability p(wi|wi-1). The last word in the sentence is associated with a probability p(EOS|wn) where EOS is the End of the Sentence. P(x|y) is a conditional probability or how likely it is that x follows word y. The probabilities are typically estimated from a large corpus in the target language.
Continuing the “Mary has a little lamb” example, the sentence “mary has a little lamb” receives the probability p(mary|BOS)×p(has|mary)×p(a|has)×p(little|a)×p(lamb|little)×p(EOS|lamb). Because the bigram conditional probabilities reflect that the sequence “Mary has a little lamb” is more likely than those in the sequence, “Mary little lamb a has,” the sequence “Mary has a little lamb” will be selected by a machine translator.
Another example of a language model is a dependency-based language model where a sentence is generated using a tree structure. FIG. 1 is an exemplary depiction of a tree structure 100 used to generate a sentence. From an invisible START symbol 105, the language model generates the head word 110 of the tree “has” with probability p(has|START). The language model then generates left modifiers 115 (i.e., “Mary”) and right modifiers 120 (i.e., “lamb”) with probabilities p_left(mary|has)×p_right(lamb), respectively. Finally, the left modifiers 125 (i.e., “a” and “little”) of “lamb” with probabilities p_left(a|lamb)×p_left(little|lamb) are generated.
To be used in machine translation, the language model is stored in a random access memory (RAM). For a bi-gram language model, such as those described herein, four pieces of information are generally stored in RAM. First, the history of the word, or preceding context on which the decision to use the word was made, is stored. Next, the word (i.e., event) that was generated based on the history is stored. Third, the conditional probability of generating the word given the history is stored. The conditional probability is expressed as a floating point value. Finally, some back-off values that enable one to estimate probabilities of events that have not been seen in the training data are stored.
If a language model becomes too large (e.g., tens of billions or trillions of words), the amount of RAM required to store the language model becomes prohibitive.