1. Field of the Invention
The invention relates generally to machine manipulation of natural languages and more specifically to the use of such machine manipulation to aid translators.
2. Description of the Prior Art
In the early days of computers, many experts assumed that the computer would soon completely automate translation from one natural language into another. Such machine translation was the subject of much research, but the problem was far more difficult than originally believed, and translation from one natural language into another has remained the province of human translators. At present, translators generally either work completely independently or under contract to a translation service bureau, which provides an interface between those seeking translations and the translators.
With the advent of the low-cost personal computer (PC), attention turned to the development of translator's tools for the PC. These tools do not aim to completely automate translational but instead to make the human translator's work easier and therefore faster and more accurate. A state-of-the-art set of such translator's tools is the Translator's Workbench, sold in the United States by MCB Systems, San Diego, Calif. As described in a 1991 brochure published by MCB Systems, the Translator's Workbench includes the following features:
A concept-oriented terminology data base which may be used in a network environment; PA1 A text analysis program which determines which words in the text to be translated are in the terminology data base and makes a project dictionary of those words; PA1 A translation editor which highlights words in the text which are in the project dictionary. When the cursor is moved to a highlighted word, the translation from the terminology data base is displayed at the bottom of the screen. The translation at the bottom of the screen can then be "pasted" into the translation. PA1 A split text display in the translation editor which shows the translator both a portion of the original and a Corresponding portion of his translation. The two sides of the display are coordinated by means of a user-specified paragraph character which is common to the original and the translation. PA1 a set of pairs of locations, each pair of locations including a first location in the first text and a second location in the second text which is in the neighborhood of the translation of the first location and the pairs of locations being selected without regard to locations Of sentences and/or paragraphs in the first text or the second text; and PA1 means for responding to a specification of a given location in the first text by employing the set of pairs of locations to determine a specification of a corresponding location in the second text which is in the neighborhood of the translation of the first location. PA1 making an F-image which has a plurality of cells, each cell representing a first portion of the first text and a second portion of the second text and each cell being given a first value if the first portion and the second portion do not contain the same n-gram and a second value if they do; and PA1 computing the alignment path from the F-image.
While translator's tools like the Translator's Workbench are indeed useful, they nevertheless do not help a translator with two classes of tasks: that of finding the right term from among a set of possible terms and that of comparing a text with any other text which is purportedly a translation of that text.
The task of finding the right term is fundamental to translation. Dictionaries or electronic equivalents of dictionaries such as the terminology data base of the Translator's Workbench only represent a starting point. To begin with, dictionaries are often out of date. Further, they often do not contain highly-specialized vocabulary. Finally, even where they do have an entry for a term, they generally offer a number of alternative translations without much guidance to how to use them. Even after a translator has avoided all of these problems, the problem of consistency arises. For example, a translation may be one of a series for a given client; within the series, the use of terms should be consistent.
Thus, in many cases, the only way to find the right term has been by hand: either the translator or a specialist in the translation service bureau compares texts and their translations which belong to the domain to which the text he is translating belongs. The text's domain may be defined by a particular subject matter or even by a particular client. Sometimes the texts to be compared are available electronically and the translator or specialist can use text searching tools on them; more often, the texts are not available electronically, and the only way to do the work is by comparing paper copies. Having made the comparisons, the translator or specialist makes a glossary which is specific to the domain. If the translator or specialist has tools such as the Translator's Workbench available to him, the translator will put the glossary into the terminology data base or an equivalent thereto; otherwise, he may simply put it into a text file or even on to paper. Clearly, the quality of a translation depends to a great degree on the quality of the glossaries available to the translator. That is particularly the case when a translation is undertaken by more than one translator. In such a situation, the glossaries are the best way of maintaining consistency among the translators.
Among the situations where it is necessary to compare a text with its translation are checking a finished translation for completeness and finding out how the revision of a text which has already been translated relates to the pre-existing translation. In both cases, the only practical way to carry out the comparison is to examine the original and the translation page for page. Of course, such comparisons are often done in haste, and often fail to detect missing material or material which should be changed in the translation of the revision.
What is needed, and what is provided by the techniques described in the following, is apparatus and methods which reduce the effort involved in comparing a text with its translation and consequently make it easier to produce domain-specific glossaries and to detect differences between a text and its translation.