(1) Field of the Invention
The present invention relates to a natural language generating system, and more specifically to a method and an apparatus for selecting a translation most suitable for a word in a sentence to be translated, when automatic translation is performed from a language to another language, for example, from Japanese to English.
(2) Description of the Prior Art
When a language to be translated (source language) is translated by an automatic translation apparatus (machine translation apparatus) into another language (target language), a sentence of the source language is analyzed and converted into intermediate representation suitable for translation, and the intermediate representation is utilized as input to target language generation.
The above-mentioned intermediate representation represents syntactic structure of a sentence or semantic structure thereof, and an element of the intermediate representation is usually a word of source language or a concept symbol.
In target language generation, a word of target language is assigned to each element of the intermediate representation, and a sentence is generated according to grammar of the target language. In this generation step, when a plurality of equivalent words of the target language exist corresponding to a single element of the intermediate representation, it becomes a problem which equivalent word should be selected as the translation word.
With respect to this problem, knowledge regarding co-occurrence of words may be utilized. The co-occurrence means that a word and another word occur having a specific semantic relation in a sentence, and the co-occurrence has restriction. Consequently, utilizing this restriction, a suitable word can be selected among a plurality of equivalent words.
Sentence generation based on this idea is disclosed, for example, in Japanese patent application laid-open No. 60-144869.
However, a method proposed in the reference has problems as follows:
(1) Since pairs of co-occurring words are massive, a memory of large storage capacity is required to realize word selection on the basis of co-occurrence, and it is difficult to collect all data regarding the co-occurrence entirely.
(2) Similar co-occurrence relations are frequently established in synonym, but co-occurrence data stored in the form of pairs of words cannot be utilized in the case of synonym.