1. Technical Field
This invention relates generally to processing content, and more particularly, to ensuring an exact translation match to source content including context to simplify and otherwise facilitate translation and other processing functions associated with the content.
2. Related Art
As information becomes more accessible on a global basis, especially given the advent and rapid utilization of the Internet and the World-Wide-Web, the role of translation has shifted away from simple transcription of source text into a target language. Translators today must ensure the timely and accurate deployment of the translated content to designated sites and customers. As such, the increased need for content translation has prompted numerous companies to develop tools that automate and aid in part of the translation process. Given that translators seek to translate content as quickly as possible, translation can be made more efficient with the greater flexibility in software functionality and the ability to save previous translations for future use. Therefore, tools have been created to save translations, including blocks and/or segments of translations, in computer memory (“translation memory” or “TM”).
Translation memories, also known as translation databases, are collections of entries where a source text is associated with its corresponding translation in one or more target languages. Translation memory includes a database that stores source and target language pairs of text segments that can be retrieved for use with present texts and texts to be translated in the future. Typically, TMs are used in translation tools: when the translator “opens” a segment, the application looks up the database for equivalent source text. The result is a list of matches usually ranked with a score expressing the percentage of similarity between the source text in the document and in the TM. The translator or a different TM system provides the target text segments that are paired with the lookup segments so that the end product is a quality translation.
There are many computer-assisted translation (“CAT”) tools available to assist the translator, such as bilingual and multilingual dictionaries, grammar and spell checkers and terminology software, but TM goes one step further by making use of these other CAT tools while at the same time matching up the original source document stored in its database with the updated or revised document through exact and fuzzy matching. An exact match (100% match) is a match where there is no difference (or no difference that cannot be handled automatically by the tool) between the source text in the document and the source text in the TM. A fuzzy match (less than 100% match) is a match where the source text in the document is very similar, but not exactly the same, as the source text in the TM. Duplicated exact matches are also often treated as fuzzy matches. A TM system is used as a translator's aid, storing a human translator's text in a database for future use. For instance, TM can be utilized when a translator translates the original text, using translation memory to store the paired source and target segments. The translator could then reuse the stored texts to translate the revised or updated version of the text. Only the segments of the new text that do not match the old one would have to be translated. The alternative would be to use a manual translation system or a different CAT system to translate the original text. The TM system could then be used by a translator to translate the revision or update by aligning the texts produced by a translator or other CAT system and storing them in the TM database for present and future work. The translator could then proceed to translate only the segments of the new text, using TM as described above.
There are many advantages in using TMs: The translation can go much faster, avoid unnecessary re-typing of existing translations, and/or enable a translator to change only certain parts of the text. TMs also allow a better control of the quality of the translation. In the related art, TM was employed to speed the translation step in large batch projects. For example, a software company may release version 1 of its software product and need to translate the accompanying documentation. The documentation is broken into sentences and translated, with all sentence pairs captured in TM. Two years later the company releases version 2 of its software. The documentation has changed significantly, but there is also a significant portion similar to the original documentation. This time, as translators translate the documentation, their work is reduced through leveraging exact and fuzzy matches from the TM. As this example illustrates, TM is typically used as an aid in a pipeline process. In the related art, there are also some limitations with the utilization of TM.
Automatically leveraging translation using exact matches (without validating them) can generate incorrect translation since there is no verification of the context where the new segment is used compared to where the original one was used: this is the difference between true reuse and recycling. In the related art, TM systems are recycling systems. With Web content, and now with many types of content, it is common for a document to be translated, and then have minor changes made to it, and then have need for it to be translated again. For example, a web document listing the advantages of a product might be translated, but then a new advantage might be added and the document would therefore need to be translated again. In the related art, TM would reduce the effort of translating the document a second time. Exact matches for most sentences would exist where the source text was identical to one or more entries in the TM. The translator then makes sure that the right exact match is chosen for each by evaluating the appropriateness of a match to contextual information. However, the related art does not provide for a determination of content context. In addition, within the related art, there is no automated process for accurately choosing the best exact match for a given segment or validating whether a given exact match is an appropriate match for the context to which it is being applied. As such, a translator is required to validate matches. The fact that a translator needs to validate and possibly perform an action for every sentence when just a few words may have changed, given that under the related art a segment may be translated differently under different circumstances or contexts, is grossly inefficient.
In view of the foregoing, there is a need in the art for an automated process which accurately validates whether a given exact match is an appropriate match for the context to which it is being applied.