The exemplary embodiment relates to the field of machine translation. It finds particular application in connection with learning language models using weights and confidences for use in a machine translation system.
Statistical machine translation (SMT) systems are used for automatic translation of source text in a source language to target text in a target language. They are often based on a standard noisy-channel model which models the probability that a target string, such as a sentence, will be found, given a source string:p(t|s)∝p(s|t)p(t)  (1)
where s and t are source and target language sentences, respectively. The conditional distribution of the source s given the target t (p(s|t)) is referred to as a translation model and the prior probability distribution over target sentences p(t) is called a language model. Language models are widely used components in most statistical machine translation systems, where they play a valuable role in promoting output fluency.
Statistical translation systems often generalize the translation model by log-linear models that allow for the addition of more features. The language model remains, however, the main vehicle by which fluency is promoted. Generative models and discriminative models have been proposed for the language model. N-gram generative language models are commonly used to equate fluency to a likelihood estimate based on n-th order Markov assumptions:p(t)=Πi=1|t|(ti|ti−1, . . . ,ti−N+1)  (2)
which estimates the probability of observing a target sentence given prior observations. However, such language models often rely on word surface forms only, and are unable to benefit from available linguistic knowledge sources. Moreover, these language models tend to suffer from poor estimates for rare features (e.g., features that are relevant but which rarely occur).
Accordingly, it would be desirable to have a mechanism for learning language models that utilizes existing linguistic knowledge and which is able to take into account rare features.