1. Field of the Invention
The present invention relates to an assigning machine translation system which disposes a translated word beside the word of the original text.
2. Description of the Related Art
Usually, when a sentence written in one language is translated into another language, it is sometimes difficult to see the corresponding relationship between the original sentence and its translation. This is especially true when the input sentence is long such that it lasts for several lines. Particularly, in the case where the translation is performed between English and Japanese for which sentence structures are quite different, it sometimes happens that a translation of a certain word in the first line of the input sentence is found in the third line of the translated sentence.
As long as a machine translation system operates in such a conventional manner that the editing is performed after the translation is completed, the fact that the corresponding relationship between the words cannot readily be found when comparing the input sentence with the resulting translation poses a burden on a user greater than the burden associated with translation accuracy.
When fast-reading or skimming a text written in a foreign language, or when a user has some knowledge of the language, or when the input sentences have relatively simple structures, it is sometimes convenient to have an output where translation is partially provided only to those words not familiar to the user instead of receiving a fully translated text having the above-mentioned problem so that the rest of the text can be comprehended by the user himself.
Prior art systems designed to answer such demand are "Translation Support System" disclosed in Japanese Laid-Open Patent Publication No. 6-243162 and "Machine Translation System" disclosed in Japanese Laid-Open Patent Publication No. 6-325081.
The translation support system disclosed in Japanese Laid-Open Patent Publication No. 6-243162 outputs the result in a format in which a translated word is disposed besides the original word in the original sentence so that a relationship between the original word and its translation becomes clear. This translation support system includes, as illustrated in FIG. 12, components such as a dictionary memory 23 which holds both dictionary data for transforming Japanese sentences which are input by the keyboard 21 and are held in a text memory 22 into Kanji characters (chinese characters) and translated word data for translating English text, a translation-not-required-word memory 24 for storing English words for which translation is not required when translating the text retrieved from the text memory 22, and registers including a translated word register 25b for holding the translated word retrieved from the text memory 22, etc.
The operation of this translation support system 26 will be described below.
For example, suppose that an English sentence "I will purchase the restaurant from my uncle" is input by the input keyboard 21 and is held in the text memory 22. When a translation command is entered by the keyboard 21, the CPU 27 controls the components so that the text data in the text memory 22 is held in the original word register 25a. The CPU also counts the number of letters in the original word and the result of the counting is held in the number-of-letters-in-original-word register 25c. Then, the CPU makes a search for whether or not the original word data held in the original word register 25a matches any of those held in the translation-not-required-word memory 24, thereby determining whether or not a translation should be given to the original word. For example, if the words "I", "will", "the", "from" and "my" are stored in the translation-not-required-word memory 24, then it is necessary to give translations to the remaining words "purchase", "restaurant" and "uncle". The corresponding translations are searched for in the dictionary memory 23. If the translation is found, the translation is retrieved from the dictionary memory 23 and is held in the translated word register 25b. In this example, as the translated words of "purchase", "restaurant" and "uncle", "kounyuusuru", "restoran" and "oji" are retrieved, respectively.
The number of letters in the translated word is also counted and the result is stored in the number-of-letters-in-translated-word register 25d. Then, display processing is performed in accordance with the numbers of letters in the original word and the translated word, and the result is displayed on the display 29 via the display memory 28. When the result is displayed, the translation is given in a word-to-word manner for easy recognition as follows. In this example, the English sentense to be translated is displayed in an upper line, and the translated words are displayed in a lower line.
______________________________________ I will purchase the restaurant from uncle. kounyuusuru resutoran oji ______________________________________
Next, the machine translation system disclosed in Japanese Laid-Open Patent Publication No. 6-325081 will be described with reference to FIG. 13.
The machine translation system disclosed in Japanese Laid-Open Patent Publication No. 6-325081 selects and presents only the information on the object language (language into which a text is translated) such as parts of speech and translation of words, thereby facilitating comprehension of the subject language (language to be translated). As illustrated in FIG. 13, this machine translation system includes input means 31 for inputting a sentence in a first language, morphological element analysis means for performing a morphological element analysis using an analysis dictionary 32, part-of-speech-presuming means 35 for finding the most suited part of speech referring to each word in the sentence in the first language using a part of speech allocation probability table 34 for the first language, output format determining means 36 for generating an output in which the first language and the information on the second language are matched, and output means 37 for outputting the result to the output device.
The operation of the machine translation system 38 will be described below.
For example, suppose an English sentence "I will book the restaurant for my uncle" is input to the system by the input means 31. Then, a morphological element analysis is performed on the sentence, thereby dividing the sentence into morphological elements to be looked up individually in the dictionary. As a result, in a language such as English in which a word can have more than one part of speech, more than one candidate is matched with each morphological element as follows.
I . . . pronoun PA1 will . . . auxiliary verb/noun PA1 book . . . noun/verb PA1 the . . . article PA1 restaurant . . . noun PA1 for . . . preposition/conjunction PA1 my . . . pronoun/exclamation PA1 uncle . . . noun PA1 I . . . pronoun PA1 will . . . auxiliary verb!/noun PA1 book . . . noun/verb! PA1 the . . . article PA1 restaurant . . . noun PA1 for . . . preposition!/conjunction PA1 my . . . pronoun!/exclamation PA1 uncle . . . noun PA1 Presumed part of speech is indicated by !.
For the above sentence, the words "will", "book", "for" and "my" have more than one candidate for the part of speech and, therefore, ambiguity remains. Then, the part-of-speech-presuming means 35 presumes allocation probabilities of parts of speech, and calculates the probability that the part of speech is correct based on the allocation probability. The part of speech which has the largest probability is then selected. In this prior art example, the trigram model where the presumption of allocation probability is limited up to two words directly in front is used as the part-of-speech-presuming means 35 in order to further take into consideration the in-word relative part of speech appearance probability. For the above input sentence, the following matching for the part of speech is presumed by the part-of-speech-presuming means 35. In particular, the part of speech of the word "book" is presumed to be a verb because of the fact that the part of speech of the word immediately in front is "an auxiliary verb".
Then, information on the second language corresponding to the selected part of speech is obtained from the dictionary. The information on the second language is given and matched at the location of the word of the first language by the output format determining means 36. When this is being done, in order to inhibit frequent outputs of the second language information which is well-known, an output inhibition column for marking is provided in the dictionary. For a word which has more than one usage, if a priority expressed in number is provided for every usage, only an output of a certain usage can be inhibited. For example, the output of the second language information on the word "book" in the above sentence is inhibited when it is a "noun (hon; a book)" and output when it is a "verb (yoyakusuru; to make a reservation)". After the above processing, the output result for the sentence becomes as follows.
______________________________________ I will book the restaurant for my uncle. yoyakusuru resutoran oji ______________________________________
However, the above-mentioned prior art translation support system 26 and the machine translation system 38 have problems described in (a) and (b) below.
(a) Translation accuracy is poor.
In the translation support system 26 disclosed in Japanese Laid-Open Patent Publication No. 6-243162, the same translation (a translation listed first or a translation with the highest priority) is given to the word regardless of other words located before and after the word. For example, since it is most common for the word "book" to be interpreted as a "noun" for part of speech and a "hon (book)" for translation, a wrong translation, that is, a "hon (book)", may be given to the above-mentioned sentence.
______________________________________ I will book the restaurant for my uncle. hon ______________________________________
On the other hand, since the machine translation system 38 disclosed in Japanese Laid-Open Patent Publication No. 6-325081 takes into consideration the allocation of part of speech for up to two words immediately before as described above, a correct translation to the word "book" in the above input sentence, namely, "a verb; yoyakusuru (to make a reservation)" is given.
However, even if an analysis on a sentence structure is performed in the machine translation system 38 disclosed in Japanese Laid-Open Patent Publication No. 6-325081, only part of the speech is specified. The system cannot ensure a correct translation since it does not make its selection based on semantic considerations and limitations from other translations having the same part of speech. For example, the word "take" has a variety of meaning (translations) such as tenitoru (to hold in a hand), ubau (to take away), tsureteiku (to take along), noru (to embark), taberu (to eat), koudokusuru (to have subscription), yousuru (to take time), ukeru (to receive), toru (to take), etc. Therefore, even if the part of speech is presumed to be a "verb", the correct translation cannot be selected therefrom. In a following input sentence, the most common translation "toru (to take)" to the word "take" is assigned. However, when the object of the verb is "a person", the correct translation is "tsureteiku (to take along)".
______________________________________ I will take her child to the zoo. toru kodomo doubutsuen ______________________________________
Accordingly, a user receiving the above two input sentences provided with wrong translations wrongly comprehend the meaning of the two sentences. Of course, it is possible to output more than one candidate for the translation in either Japanese Laid-Open Patent Publications No. 6-243162 or No. 6-325081. However, this places another load on a user to make a selection for the correct translation by himself.
(b) It is not true that translation is given only to necessary words.
To a user, it is sufficient that translation is provided for only those words which are "not familiar" to him. Conversely, it often becomes an obstacle to have a translation provided for a familiar word. The systems in Japanese Laid-Open Patent Publications No. 6-243162 and No. 6-325081 have pre-set "translation-not-required words", "setting of output-inhibition-not-required fields" and "priority for translations" in order to limit words for which a translation should be provided. Although these words for which translations are not required are fixed to the system, the familiarity with words, i.e., whether or not the user knows the word differs greatly among users. In simple, it is quite possible that a word which is unknown to an elementary level student of English (i.e., a word for which a translation must be provided) is already known to an advanced level student. Therefore, if words for which translations are not required are fixed in the system no matter who uses the system such as in the systems disclosed in Japanese Laid-Open Patent Publications No. 6-243162 and No. 6-325081, then translations are not provided for particular words in a meaningful manner.
As described above, when (a) the translation accuracy is poor or (b) there are too many translations for words which do not require translation, a fast reading or skimming cannot efficiently be performed.