1. Technical Field
The invention is related to statistical word translation, and in particular, to various techniques for learning probabilistic models for use in machine translation of words, phrases or sentences in one language to another language, or to alternate words, phrases or sentences in the same language.
2. Related Art
Word alignment is an important step in typical approaches to statistical machine translation. In machine translation, it is generally assumed that there is a pair-wise mapping between the words of a source sentence in a first language and a target sentence in a second language. This mapping is typically generated using probabilistic word alignment modeling. A number of classical approaches to word alignment are based on Hidden Markov Model (HMM) based alignment models.
Although HMM based word alignment approaches generally provide good translation performance, one weakness of conventional HMM based approaches is the use of coarse transition models which generally assume that word transition probabilities depend only on a jump width from a last model state to a next model state. Several translation schemes have attempted to improve transition models in HMM based word alignment by extending word transition models to be word-class dependent. Related schemes have modeled self-transition probability separately from other transition probabilities to address cases where there is no analog for a particular word in the language to which a phrase is being translated. Further adaptations of such schemes include using a word-to-phrase HMM in which a source word dependent phrase length model is used to improve translation results.
Unfortunately, these types of translation schemes generally model the probability of state occupancy (self-transition) rather than a full set of transition probabilities. As such, important knowledge of jumping from a particular source word to another position, e.g., jumping forward (monotonic alignment) or jumping backward (non-monotonic alignment), is not modeled. Further, these types of translation schemes do not adequately address the problem of data sparsity in detailed word transition modeling.