In recent years, occasions are increasing in which technical documents and business documents that contain technical terms and company-specific terms are translated and offered in multiple languages, in global companies, communities in which people of different mother tongues gather together, and the like. In order to accurately translate documents that contain technical terms and the like, it is necessary to prepare a parallel-translation dictionary that contains parallel translations for such technical terms and the like.
As a method for creating a parallel-translation dictionary that contains parallel translations for technical terms and the like, a method has been known in which parallel-translation words across multiple languages are extracted using a multi-language document group that includes documents of multiple languages which contain a corresponding subject matter. In this kind of creation method, for example, using a large-scale seed dictionary prepared in advance, the word vector of each word is obtained from the context and syntax, and a pair of words whose word vectors are close across languages are extracted as parallel-translation words (for example, see Non-Patent Document 1).
Meanwhile, as another method for extracting parallel-translation words across multiple languages using a multi-language document group, a method has been known in which parallel-translation words are extracted based on the topic (semantic classification) of words (for example, see Non-Patent Document 2). This kind of extraction method utilizes the idea that words in a document have a potential topic, and words having the same topic tend to appear in the same document. That is, topics of words are modelled by taking into account only the frequency of appearance in the document while ignoring the arrangement order of words in the document, and parallel-translation words are extracted from a pair of words that have the same topic across multiple languages.
Non-Patent Document 1: Andrade, Daniel, Matsuzaki, Takuya, & Tsujii, Jun'ichi, “Effective Use of Dependency Structure for Bilingual Lexicon Creation.”, In Alexander Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing: 12th International Conference, CICLing 2011, Tokyo, Japan, Feb. 20-26, 2011. Proceedings, Part II (pp. 80-92). Berlin, Heidelberg: Springer Berlin Heidelberg.
Non-Patent Document 2: Liu, Xiaodong, Duh, Kevin, & Matsumoto, Yuji, “Multilingual Topic Models for Bilingual Dictionary Extraction.”, ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 14 Issue 3, June 2015, Article No. 11.