Machine translation refers to the utilization of computing devices to achieve the translation of text from one language to another language. The software implementing machine translation is called a machine translation system. With the development and popularization of computers and the Internet, cultural exchange among people has become more and more frequent. However, there are language barriers in the new era, and there is an urgent need for machine translation.
Machine translation can be divided into a rule-based method and a corpus-based method. The corpus-based methods can be classified into two categories: statistics-based and example-based methods. For a statistics-based machine translation system, a large amount of translation rules are defined by human beings. These rules are rewritten into computer programs to achieve functions of translation. Machine translation based on translation rules have some features such as high translation quality, high costs, low rule coverage, ambiguity etc. As computers become more efficient, the statistics-based machine translation system has dramatically developed since the 1990s, and has gradually become a core research of machine translation. The statistics-based machine translation system is trained using a large scale bilingual corpus based on translation training child models (including translation rule tables, language models, models and other reordering discriminant models or formula etc.). Ideal translation text may then be determined based on scores of these sub-models. Currently, the statistical machine translation methods can be divided into: word-based, phrase-based, hierarchical phrase-based and syntactic-based methods. The statistics-based machine translation system is the most common method for machine translation.
However, existing statistics-based machine translation methods do not reach natural language semantic levels during the generation of candidate translations of each original segment. This results in semantic deviations between the original segment and its candidate translations and in the failure of reaching the same semantic translation effects, thereby severely reducing the quality of machine translation. For example, the original fragment includes the “apple” from “the apple product”, and the “apple” is the expression of the semantics of the term “Apple Inc.” If translated into food “apple”, a semantic deviation may occur, thus affecting the overall effect of the original translation.
In summary, statistical machine translation using existing techniques may cause semantic inconsistency between the original fragments and their translations.