The present invention relates to a computer-based translation system. More particularly, the present invention relates to a computer-based translation system in which a person having full understanding and competency in a language of an original source text to be translated provides and/or confirms the meaning of the text without requiring knowledge of any other language, so that data concerning the meaning of the original language text may be used to automatically translate the original text. The invention also relates to a computer-based translation system which automatically translates meaning code data representing a source text in order to obtain a translation in a particular language. The invention further relates to a method of translating in which meaning data concerning a source text is provided by a person familiar with the language of the source text without requiring any knowledge of another language, and then translating the meaning data into a translation destination text automatically without requiring any other understanding of the source language.
In the field of automated translation systems, two approaches have traditionally been taken. In the first approach, artificial intelligence has been used to provide a best guess of the meaning of the source language in order to be able to generate automatically a translation of the source text. Such automated systems recognize parts of speech in the source language and this grammatical information is used in order to reconstruct in the destination language a suitable translation. When a word in the source language has two meanings, the most probable meaning based on the context is used in order to provide the translation The context is determined by the presence of other words. The output from such systems is a translated text which to date has been of dubious quality and reliability.
In the second type of translation systems, the automated translation systems provide an aid to translators in which the source text is automatically parsed grammatically and each possible translation for each word in the sentence may be selected by the translator in order to obtain efficiently the translation text. The translator must be knowledgeable as to the meaning of the original language as well as the destination language in order to be competent to confirm that the passing of the source text is accurate and to select the correct translations for each word in the sentence, and thus produce an accurate translation.
In the prior art, two attempts to provide a different type of translation system are worth noting. In U.S. Pat. No. 5,587,903 to Yale, a sentence input by a user is translated into Esperanto using his or her native language. This is similar to the second type of translation systems, except that the user is translating from his or her native language into Esperanto, and the translation includes databases containing relational and/or grammatical information about the Esperanto text. The result obtained is to map the thought of the sentences translated in a form recognizable by a machine. In xe2x80x9cTechnical translation as information transfer across language boundariesxe2x80x9d by P. C. Ganeshsundaram, Journal of Information Science 2(1980), pp. 91-100, a framework for pre-editing a text in the source language to defme parts of speech of the words is disclosed. In this pre-editing, no translation or determining of the meaning of the words is carried out. For basic technical texts, it is proposed that the pre-edited text can be accurately machine translated using literal translations of the pre-edited words into one of many target languages.
It is a general object of the present invention to provide a translation system in which the burden of defining the exact meaning of a text to be translated is carried out by a person knowledgeable of the language and of the meaning of a text to be translated without requiring any knowledge of the language into which the text is to be translated. Data representing the exact meaning is stored in order to facilitate automated translation into one or more destination languages. For example, the author of a text who wants his or her text to be readily translated into other languages may use a text editor according to the invention in order to provide the necessary meaning data in order that translation can be done automatically without requiring any further linguistic data.
It is a further object of the present invention to provide an automatic translation text generator which creates a translation text from the product of a unilingual meaning editor.
According to a broad aspect of the invention, there is provided a translation system including a universal meaning editor having an input text storage means, an editor for allowing a user to ascribe a particular meaning to each word or group of words in the input text, and a language database means for containing all known meanings for each word or group of words in the source language. The translation system allows the user to select which of the known meanings for each word or group of words in the source text is the correct meaning for each word or group of words, and the result of the user selection is used to generate meaning code data representing the meaning of the original language text.
Preferably, the meaning code data is sufficiently explicit to permit automatic translation into a plurality of languages. In one preferred embodiment, the meaning code data also includes layout information so that the automatic translation generator may produce output text respecting the same format as the input text, and in particular the automatic translation generator may scale the size of the translated text in order to generate a text object having dimensions specified in the original source text format. It is also a preferred feature of the present invention that the automatic translation generator be a plug-in software module for use with existing text editors or HTML display software, otherwise known as web browsers.
Also preferably, the unilingual meaning editor is configured so that when it encounters a word not found in its language database, the editor may allow the user to input a suitable synonym. The editor may record the unknown word and create a pointer to the closer synonym. Additionally, it may be preferable that the unilingual meaning editor permits the user to reconstruct the original sentence in the case that a word or group of words are not found in the language database. The reconstructed sentence may thus be a sentence more likely to be easily defined using the existing language database. Also preferably, it may be desirable to permit the unilingual meaning editor to leave a word or group of words untranslated in order to appear in the translation text in italics, quotation marks or some other special script identifying the words as being original foreign words (such as xe2x80x9cKatakanaxe2x80x9d in Japanese). Words which are not found in the language database may also be automatically reported by telecommunication means to the language database creator for the purposes of revision of the language database and implemented in a future release. In the latter case, the language database used may be an on-line database or may be updated at regular intervals by telecommunication means.
In some cases, it is desirable to create a language database which requires a degree of specification by the user of meaning relevant to a particular set of languages, which set is less than a complete set of languages, in order to simplify the degree of detail required in order to ascertain the exact meaning of the input text. For example, a language database for the English language may be created in order to ascertain the meaning in the English language for the purposes of producing a translation into any romance language. A separate English language database may be created in order to ascertain the meaning for the purposes of translating the meaning code data into Japanese, Chinese and Korean.
If the unilingual meaning editor has been used in order to ascertain the meaning of an input text for romance languages, the meaning editor used for ascertaining the meaning of a text for automatic translation into the oriental languages may be provided with the meaning code data from the romance languages in order to reduce the time that the user must spend in ascribing the necessary meaning to generate meaning code data which can be used for translation of the input text into oriental languages. As can be appreciated, the automatic translation generator according to the invention would be capable of performing a slightly less than perfect translation into one language when basing the translation on meaning code data not intended to encompass the one language.
The invention also provides within the context of an automatic translation generator a user-controlled editor for carrying out refinements or stylistic changes to the translation text in which any potential ambiguity which may appear in the translation text may be eliminated by providing the user with some or all of the meaning code data associated with the text being revised, and preferably in the language of the translated text so that the reviser of the translated text need not have any knowledge of the source language. It is also preferred that the meaning code data include a complete specification of the original input text so that a reverse translation back into the original input language always provides an exact replica of the original text.
According to the invention, there is provided a translation system for translating an input text into a meaning code using input from a user requiring knowledge of a language of the input text in which the meaning code is to be converted by a machine translation system to an output text in at least one different language. The system comprises parser means for recognizing sentences and words within the sentences of the input text. The parser means locates the words in a term database. A meaning editor means is provided for obtaining from a meaning database a plurality of meaning descriptions in the language of the input text for each of at least some of the words having plural meanings in the term database, for receiving from the user a confirmation of which of the plurality of meaning descriptions is appropriate for each of the words, and for receiving from the user an indication of a part of speech of the words in each of the sentences. Meaning code generator means are also included for receiving data from the meaning editor means and for generating a meaning code corresponding to the input text. The meaning code comprises an identification code corresponding to a meaning for each word found in the input text and sufficient grammatical information to allow for the meaning code to be accurately machine translated in any one of the at least one different language. Preferably, the meaning editor means further comprise input means for allowing a user to provide input in response to at least one of the words in a given sentence not being found in the term database. Also preferably, the system includes means for adding a new entry in the term database and to link the new entry to at least one meaning description in the meaning database, so that a quality of the term database can be developed with use of the system. Similarly, the system preferably includes means for editing the term database to change links between entries in the term database and meaning descriptions in the meaning database. The meaning editor means for receiving from the user an indication of a part of speech of the words in each of the sentences may prompt the user to provide grammatical information which is not required by the language of the input text and is useful in providing an accurate translation into at least one of the at least one different language, and preferably the meaning code is complete to allow for translation into at least two different languages.
The invention further provides a translation system for translating the meaning code into an output text, the meaning code comprising an identification code corresponding to a meaning for each word found in the input source text and sufficient grammatical information. This system comprises a meaning code to destination language database means for providing a translated term corresponding to each identification code in the meaning code, and sentence builder means for compiling the output text using each translated term and grammatical information for each sentence structure contained in the meaning code.
The invention also provides a method of human-assisted machine translating an input text in one language to obtain an output text in at least one different language, the method comprising the steps of:
defining a part of speech and a meaning of words in each sentence of the input text using an editor in the language of the input text, the meaning defined for each one of the words being derived from a predetermined meaning database having a set of meanings and corresponding meaning codes;
storing information including meaning codes derived from the previous step in a meaning code file;
providing a machine translator apparatus for machine translating the meaning code file to one of the at least one different language, the translator apparatus including a database of translated terms corresponding to the meaning codes; and
generating the output text from the meaning code file using the machine translator apparatus.