An automatic translation technology means a software technology that automatically converts one language into another language. A research into the technology has started for a military purpose in U.S. from the mid of 20C. In these days, the technology had been actively researched by a plurality of laboratories and private companies for the purpose of extension of an information access range and innovation of a human interface worldwide.
In an initial step of the automatic translation technology, the automatic translation technology has been developed based on a bilingual dictionary manually prepared by a specialist and a rule to convert one language into another language. However, development of a statistical translation technology that automatically learns a translation algorithm statistically from mass data has been in active progress from the beginning of 21C when computing power is rapidly developed.
A statistical machine translation (SMT) system statistically models a translation process and learns a translation knowledge and a translation probability, and a creation probability for a target language from mass parallel corpora to generate a target sentence most appropriate to a source sentence input based thereon.
The recent statistical machine translation system may be generally classified into a phrase-based SMT (hereinafter, referred to as PBSMT) type and a syntax (grammar)-based SMT (hereinafter, SBSMT) type.
The PBSMT, which translates consecutive word string (hereinafter, referred to as a phrase) as one unit instead of performing individual word to word translation, is a method for generating a phrase combination which has the most probability during decoding after learning a translation knowledge and a translation probability of phrase to phrase.
The most representative model of the PBSMT is Koehn et al., 2003 and Och and Ney, 2004a. This model is simple and is characterized in that a short-distance word order is easily changed, and translation expressed with several words is naturally performed. However, in this model, a long-distance word order is not easily changed and in particular, a big problem is caused in a pair of languages which are significantly different from each other in word order, such as English-to-Korean translation. The reason is that only some of all available permutations between phrases are considered without explicitly modeling intergrammer conversions to determine the word order in a sentence in the translation model of the PBSMT.
Therefore, in recent years, a method of modeling conversion of a grammar-based syntax has been primarily researched and is called the SBSMT. The SBSMT learns tree-to-tree or tree-to-character string conversion knowledge and probability from a syntaxtree corresponding to two languages in parallel corpora in order to learn the syntax conversion knowledge. The SBSMT is characterized in that the long-distance word order is more easily changed and non-consecutive phrases are more easily translated than the PBSMT. However, since the SBSMT is significantly dependent on the performance of a parser and translation knowledge is constrained to a grammatical phrase unit, translation knowledge to be used itself is much smaller. As a result, when there is no translation knowledge to be used, translation of consecutive word strings becomes simple word-to-word translation or unnatural translation which is not matched with linked words. Representative methods include Galley et al., 2004, 2006, Lavie et al., 2008, Yamada and Knight, Gildea et al., and the like.
Like this, in the statistical machine translation technology in the related art, the PBSMT type improves fluency of consecutive word translation, but fails to change the long-distance word order to thereby generate a completely different sentence. In the SBSMT type, a word order of a generated target sentence is correct, but simple word-to-word translation is performed due to a shortage in translation knowledge, and as a result, translation is not natural.