1. Field of the Invention
The invention generally relates to translating expressions from one natural language into another natural language, and in particular assisting a translator to get the right translation for any phrase.
2. Description of the Related Art
Any translator is evaluated according to two criteria: translation speed and translation quality. One difficulty affecting both of these criteria is the appearance of a word or group of words which makes the translator hesitate. Finding the suitable translation may lead to a time-consuming manual search, with no guarantee of the result.
Presently, several techniques have been developed for assisting a translator. One of these techniques involves the use of contextual dictionary look-up. Contextual dictionaries allow for getting the translation of a word according to its context. This technique is strongly limited in the extent to which translations are possible, i.e. by looking up a contextual dictionary, the translator is provided with a low number of proposed translations only.
Further, multi-lingual terminology databases exist which are based on translations of pre-accepted terms. This technique is strongly restricted to the prestored set of terms, and the translator is not assisted in translating expressions which are not part of the set of pre-accepted terms.
A further technique is based on the use of translation memory which stores already translated sentences. When a sentence has to be translated, the system queries the database and automatically proposes a translation. However, this system requires matching complete sentences, even if the matching can be fuzzy, so that this technique is again strongly restricted in its applicability.
Another translation technique has been proposed by M. Nagao, xe2x80x9cA Framework of a Mechanical Translation between Japanese and English by Analogy Principlexe2x80x9d, Artificial and Human Intelligence (A. Elithorn and R. Banerji, eds.), Elsevier Science Publishers, 1984, pgs. 173-180. This technique involves aligning and linguistically parsing sentences for machine translation. The parse trees from each pair of sentences are also aligned. One drawback of this technique is that such machine translation systems require performing an overall parse of the translated sentences. Another drawback is that subtrees are needed to be aligned, resulting in a considerably high computational overhead.
The present invention has been made in consideration of the above situation and has as its primary object to assist a translator to achieve an improved quality of the resulting document.
It is another object of the present invention to contribute to a controlled translation to prevent expensive manual search for unknown expressions, thereby providing functionality in addition to that of using translation memory and terminology databases.
It is still another object of the present invention to provide the translator with an easy-to-use, efficient and reliable tool which is capable of promptly replying to the translator""s request for assistance.
A further object of the present invention is to be compatible with existing technology and software tools.
These and other objects of the present invention will become apparent hereinafter.
To achieve these objects, according to a first aspect, the invention provides a method for translating a word phrase from a first natural language to a second natural language. The word phrase is a group of two or more associated words. The method comprises the steps of inputting a text written in the first language; extracting a word phrase from said text; and querying a database for the extracted word phrase using a phrase index of said database. The phrase index indexes text fragments by word phrases. The text fragments represent a primary grammatical unit including at least one clause. The database contains pairs of text fragments, with each pair including a text fragment in the first language and a corresponding text fragment in the second language. A translation of said extracted phrase is then obtained based on one of the pairs of text fragments revealed during the step of querying the database.
According to a second aspect of the present invention, there is provided a computer-readable storage medium storing instructions for translating a word phrase from a first natural language to a second natural language by performing the steps according to the first aspect.
According to a third aspect of the present invention, there is provided a system for translating an input text from a natural source language to a natural target language. The system comprises storage means for storing a database containing a plurality of pairs of text fragments. The text fragments represent a primary grammatical unit including at least one clause. Each pair includes a text fragment in the source language and a corresponding text fragment in the target language. Each text fragment contains at least one word phrase. The word phrase is a group of two or more associated words. The system further comprises a phrase extractor for extracting a word phrase from a text fragment of said input text, and database retrieval means for retrieving, from said database, pairs of text fragments that contain the extracted word phrase, using a phrase index of database. The phrase index indexes text fragments by word phrases. The system furthermore comprises user interface means for allowing a user to select one of said retrieved pairs of text fragments to obtain a translation of the extracted word phrase.
According to a fourth aspect, the invention provides a method for generating a text fragment database for use in translating a word phrase from a first natural language into a second natural language. The word phrase is a group of two or more associated words. The method comprises the steps of inputting a first document containing a text written in the first language; inputting a second document containing said text written in the second language; aligning corresponding text fragments of the first and second documents; extracting word phrases from the text fragments of the first document; and generating index information on the extracted word phrases and the aligned text fragments holding the word phrases. The text fragments represent a primary grammatical unit including at least one clause.
According to another aspect of the present invention, in the methods and systems according to the first to fourth aspects, the word phrases preferably are noun phrases. Alternatively, the word phrases may also be verb phrases. In another alternative, the word phrases may be predicates involving at least one verb and one noun or adjective used as a noun.
According to still another aspect of the present invention, the primary grammatical units are sentences.
It is still another aspect of the present invention that, once pairs of text fragments have been retrieved from the database, these retrieved pairs of text fragments are presented to the translator. Alternatively, the translator is provided with proposed translations of the extracted word phrase, based on the retrieved pairs of text fragments. In either case, the translator approves a translation, and the approved translation is then used as translation of the extracted word phrase.
According to still another aspect of the invention, in the systems and methods according to the above aspects, the step of querying the database for the extracted phrase includes the step of querying the database for sub-phrases, i.e. for all word phrases partly matching the extracted phrase.
Finally, the present invention according to any of the above aspects, may involve the step of obtaining a translation by querying a terminology base in addition to the phrase-indexed text fragment database.
By using the approach of the present invention, the database is phrase-indexed. Extracted word phrases directly index whole text fragments. In preferred embodiments, the noun phrases are used to index a sentence database. The extracted noun phrases directly index whole sentences thereby leaving the recognition of the corresponding sub-units in the translated sentences to the translator. Therefore, no overall parse of the translated sentences is performed and no alignment of subtrees is necessary.
The invention is further advantageous in that it makes use of already translated material and presents to the translator, in the preferred embodiment, sentences containing the respective noun phrase both in the source and target language. By using a phrase-indexed sentence database, both translation speed and translation quality are improved.