The present discussion is generally related to automatic translation of text from one language to another. More particularly, the present discussion is related to translation training data used during the translation of text.
Consistency is one of the primary quality measurements of any translation of text from one language to another, whether translations are performed manually or automatically. This is especially true in certain applications such as technical discussions, where inconsistent translations of terminology can cause confusion. Indeed, consistency in the translation of terminology is important to the readability of localized materials and any example-based/statistical machine translation quality. The quality of such machine translation systems, which utilize parallel data corpuses in both a source language (the language in which the text to be translated is written) and a target language (the language in which the text is to be translated) to find examples of translations and select translations using statistical methods thus depend on the quality of training data from which translations are created. Inconsistencies in terminology translations could lead to lower quality translations.
However, terminology can be translated differently, depending on a given context. As an example, the English term “file name” can have multiple Japanese variations, including “”, “”, and “”. While multiple translations for some terms is inevitable, given the different contexts in which particular terms can be used, if inconsistent terms are used in the same context, than readers of translated texts can become confused.
Terminology translation inconsistency may derive from different sources. One potential cause of inconsistency can be a lack of standardized terminology data. If particular terminology is not standard in either the source language or the target language, multiple translations of the terminology will probably yield inconsistent results. Another potential cause of inconsistency can be human errors. Regardless of the cause, inconsistencies in acquired training data presents problems unless the inconsistencies are recognized and addressed to ensure consistent translation of terminology by machine translators.
The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.