Technical Field
The present invention relates generally to data-driven machine translation, and more specifically, to systems and methods for lexicon extraction from non-parallel data.
Description of the Related Art
The rapid growth of the Internet has produced massive amounts of multilingual information that has been available on different information channels. The number of non-English pages is rapidly expanding. According to recent reports, 49.4% of the websites on the Internet are written in non-English languages and this number is still increasing because the growth rate of English websites is much lower than many other languages such as Spanish, Chinese or Arabic. In this multi-language environment, one challenging but desirable task is to integrate the information in different languages.