The present invention relates to automated language translation systems. More particularly, the present invention relates to extracting transfer mappings automatically from bilingual corpora, the mappings associating words and/or logical forms of a first language with words and/or logical forms of a second language.
Machine translation systems are systems that receive a textual input in one language, translate it to a second language, and provide a textual output in the second language. Many machine translation systems now use a knowledge base having examples or mappings in order to translate from the first language to the second language. The mappings are obtained from training the system, which includes parsing sentences, or portions thereof, in parallel sentence-aligned corpora in order to extract the transfer rules or examples. These systems typically obtain a predicate-argument or dependency structure for source and target sentences, which are then aligned, and from the resulting alignment, lexical and structural translation correspondences are extracted. The transfer mappings represent these correspondences or associations.
Translation systems that automatically extract transfer mappings (rules or examples) from bilingual corpora have been hampered by the difficulty of achieving accurate alignment and acquiring high quality mappings. For instance, the alignment and transfer-mapping acquisition procedure must acquire mappings with very high precision and be robust against errors in parsing, sentence-level alignment and in the alignment procedure itself. It can also be desirable that the acquisition procedure produce transfer mappings that provide sufficient context in order that a fluent translation from the first language to the second language is obtained during translation. However, as the size or specificity logical forms of the mappings increase, the general applicability of the trained system may decrease.
There is thus a need to improve upon machine translation systems. Systems or methods that address one, several or all of the aforementioned problems would be very beneficial.