1. Technical Field
The present invention relates to machine translation systems and methods and more particularly to performing machine translation in continuous space between languages.
2. Description of the Related Art
In phrase based statistical machine translation (SMT) systems, estimates of conditional phrase translation probabilities are a major source of translation knowledge. The state-of-the-art SMT systems use maximum-likelihood estimation from relative frequencies to obtain conditional probabilities. A phrase pair extraction is based on an automatically word-aligned corpus of bilingual sentence pairs. The alignment consists of information about which source language words are linked to which target language words. These links indicate that either the pairs are translations of each other, or they are parts of phrases that are translations of each other. In the phrase based SMT systems every possible phrase pair up to a pre-defined phrase-length with the following constraints are extracted; 1) phrases must contain at least one pair of linked words, 2) phrases must not contain any words that have links to other words not included in the phrase pair.
In the state-of-the-art method, the phrase translation probabilities are estimated simply by marginalizing the counts (C) of phrase instances. For example,
      p    ⁡          (              x        |        y            )        =                    C        ⁡                  (                      x            ,            y                    )                                      ∑                      x            ′                          ⁢                  C          ⁡                      (                                          x                ′                            ,              y                        )                                .  This method is used to estimate the conditional probabilities of both target phrases, given source phrases, and source phrases, given target phrases. In spite of its success, the state-of-the-art phrase pair conditional probability estimation method suffers from several major drawbacks. These drawbacks include: 1) overtraining, 2) lack of generalization, 3) lack of adaptation and 4) lack of discrimination.
The overtraining problem (1) arises because the empirical distributions, which are estimated as described above, overfit a training corpus and suffer from data sparseness. For example, phrase pairs that occur only once in the corpus, are assigned conditional probability of 1, higher than the probabilities of pairs for which much more evidence exists. However, overlapping phrase pairs are in direct competition during decoding, and have the potential to significantly degrade translation quality.
The generalization problem (2) arises because decoding with the state-of-the-art model does not propose phrase translations that are not observed in the training parallel corpus. Typically, the phrase translation table contains millions of entries, and phrases of up to tens of words. Additionally, the current methods fail to model the semantic similarities between the word and sentence pairs. For example, the sentences: “The cat walks in the bedroom”, and “A dog runs in the room” are quite similar in structure, but state-of-the-art models are unconscious of this similarity and are incapable of using the similarity.
The adaptation (to a new domain, speaker, genre and language) issue (3) has not been addressed at all in machine translation so far, because a phrase translation table has a huge number of parameters. The typical practice is to collect a large amount of data (sentence pairs) for the target domain to build an SMT system, rather than adapting an existing system to the target domain/application. This is because it is very difficult to adapt an existing SHT system using a relatively small amount of target domain/application data.
Regarding the discrimination problem (4), probabilities of the phrase translation pairs are estimated based on empirical counts. However, discriminatively estimating phrase pair probabilities can and should improve the overall system performance.