Field of the Invention
This invention relates to translation apparatus, systems, and methods and more particularly relates to computer aided translation.
Description of the Related Art
Over the past few decades a great deal of time and money has gone into an effort which would allow computers to automatically translate text from one language to another (commonly referred to as machine translation or MT). These efforts have failed because computers do not have the capacity to do the abstraction required in the art of translation.
As developers came to understand the limitation of computers to engage in the Art of translation, their development efforts shifted to creating Computer Aided Translation (CAT) systems, primarily relying on Translation Memory (TM). TMs, when used in CAT systems, eliminate the retranslation of text segments that have already been translated. These systems simply compare source text to a body of previously translated text (corpora). Using various algorithms the computer generates percentage scores to indicate the similarity of the source text and previously translated segments in the corpora.
CAT systems that use TMs are exceptionally useful where the same sentence (segment) is encountered repeatedly. In such an environment, translation can occur very rapidly, in fact, automatically in rare cases where the similarity score is 100%. However, this methodology's primary weakness is that it does not engage in any meaningful analysis beyond simple comparison. Further, this methodology does not attempt to make any significant comparisons at a sub-sentence level. This is a substantial weakness in situations where the text varies greatly but the concept varies only slightly. For example, the sentences “The cat ate the tuna” and “The cat is eating the cheese,” while very similar in concept would generate such a low similarity score that the previously translated text would likely not be presented to the translator.
Further, CAT systems that rely on TMs fail where source words do not change but context changes significantly. Consider the difference between “the China case came apart easily,” and “the china case came apart easily.” The word “case” may have two completely different meanings although words of the sentences are identical. For example, in the first sentence “case” may be referring to an argument regarding the country of China while “case” in the second sentence may be referring to a cabinet where dinnerware is stored. Thus, TM-based CAT methodology might automatically insert an inappropriate translation or mislead a translator.