Many human translators already use a translation memory (TM) to increase their productivity. A TM contains a database and a sentence pair retrieval module. The database consists of a large number of bilingual sentence pairs, each consisting of a source-language sentence and its translation into the target language sentence; sometimes the database also includes information about which documents the sentence pairs were extracted from. These sentence pairs often come from earlier translations carried out by translators working for the same organization or company, or by the same individual translator; they may also come from the client for whom the translation is being performed. Suppose, for instance, that a translator wants to translate the English sentence “The cat ate the mouse” into French with the help of the TM. FIG. 1 (prior art) shows this situation. The translator requests that the TM's retrieval module search for the source-language sentence to be translated. In the example, the TM is successful in finding an exact match to “The cat ate the mouse” and displays to the translator the information that this sentence was previously translated as “Le chat a mangé la souris”. The translator may then decide to follow this suggestion and translate the sentence the same way as in the example retrieved from the TM's database, or decide on another translation. Even if he or she chooses to produce a different translation, having a previously-translated example to look at often helps productivity (for instance, the retrieved example may remind the translator how to translate some rare words in the input sentence).
Some TMs may display information even if an exact match to the source-language sentence is unavailable, by showing one or more “fuzzy matches” to this input sentence. This situation is also displayed in FIG. 1 (prior art). Here, the TM has displayed a source-language sentence that is quite similar to the input sentence to be translated, “The cat chased the mouse”. Information about how sentences that are similar to the input sentence were translated in the past can also be very useful to translators, providing them with much of the vocabulary and syntax they need to translate the input sentence.
In order to support the capabilities of disclosed TM shown in FIG. 1 (prior art), a numerical measure of similarity between the input and the retrieved sentences written in the source language is employed. One such measure is the number of words or characters that must be deleted, substituted or inserted to transform the input sentence into the source-language sentences stored in the TM. For instance, if we compare the input sentence “The cat ate the mouse” with the sentences shown to be stored in the TM at the top of FIG. 1, we see that it takes 2 substitutions to turn this sentence into “The bird ate the seeds”, 2 substitutions to turn it into “The dog ate the meat”, 2 substitutions to turn it into “The snake ate the bird”, but only 1 substitution to turn it into “The cat chased the mouse”. Thus, if the sentence retrieval module uses the number of words that must be deleted, substituted or inserted to go from the input sentence to a sentence in the TM as its source-language similarity measure, it will display “The cat chased the mouse” in preference to the other non-identical sentences shown, when the user requests a display of fuzzy matches. Different conventional TMs use different kinds of similarity measures between sentences in the source language—for instance, the similarity measure may incorporate syntactic or structural information.
Thus, disclosed TM contain sentence pairs each consisting of a source-language sentence and its target-language translation. When the user enters a new input source-language sentence, the system retrieves sentence pairs whose retrieved source-language member is identical to or similar to this input source-language sentence, using a numerical measure involving words in the source language only. No numerical measure relating words in the source language to words in the target language is employed
Some disclosed TMs have an additional capability, shown in FIG. 2 (prior art). Such TMs contain rules enabling them to recognize and even translate certain specialized entities such as dates and numbers. For instance, suppose that at some point a translator translated the sentence “The cat ate the mouse on March 1st” as “Le chat a mangé la souris le premier mars” and stored this sentence pair in one of these more advanced conventional TMs. Because of the rules enabling it to recognize dates in English and in French, the system stores the sentence pair as an instance of the general pattern “The cat ate the mouse on DATE∥Le chat a mangé la souris DATE”. At some future date, when a translator asks for help with the new input sentence “The cat ate the mouse on September 12th”, the TM recognizes this as a general instance of the pattern “The cat ate the mouse on DATE” whose DATE component has the value “September 12th”. It uses a specialized rule to translate this into “le 12 septembre” and then uses this string of words to replace the symbol “DATE” in the target-language side of the matching pattern. Thus, it will display to the user the suggested translation “Le chat a mangé la souris le 12 septembre”. Again, note that no numerical measure relating words in the source language to words in the target language is employed.
Neither of these embodiment use numerical measures measuring the strength of association between the words in the input source language sentence and words in the retrieved target language sentence.
It is an object of the invention to use numerical measures measuring the strength of association between words in both the input source language sentence and words in the retrieved target language sentence.
It is a further object of the invention to provide a translation alignment means between the input source language sentence and the retrieved target language sentence as part of an enhanced translation memory.