1. Field of the Invention
The present invention relates to a natural language processing system and, more particularly, to a sentence structure generating device for use in a machine translation apparatus or the like.
2. Description of the Related Art
In the field of natural language processing systems which provide natural language outputs in response to natural language inputs, various types of translation systems have been proposed, such as a machine translation system for translation, for example, from Japanese into English and an interactive response system arranged to receive a question in English or Japanese and output an answer in English or Japanese. In such a system, first of all, an input Japanese (or English) sentence is analyzed and the conceptual structure (semantic structure) of the Japanese (or English) sentence is determined. The conceptual structure is in general expressed as a semantic network which comprises nodes representing individual concepts and arcs representing the relationships between the concepts. The Japanese (or English) concept which has been obtained from the analysis of the Japanese (or English) sentence, is transferred to a corresponding Japanese (or English) conceptual structure in order to compensate for the difference between the structures of the Japanese and English languages. Translation into the corresponding English (or Japanese) sentence is achieved on the basis of the conceptual structure of the English (or Japanese) language. The present invention pertains to the above-described process which is generally called "generation of a sentence structure".
Referring to FIG. 2, which is a block diagram showing the general arrangement of a conventional machine translation system, a technical field related to the present invention will be explained below in further detail. In FIG. 2, an input sentence is denoted by S21. A morpheme analysis device 21 divides the input sentence S21 into a plurality of morphemes. In general, the morpheme analysis device 21 is required to process an agglutinative language, such as Japanese, in which no clear division is present between words which constitute a sentence, but it is not needed with respect to an inflectional language, such as English, in which clear divisions are present between words. The input sentence S21 is subjected to an morpheme analysis in the morpheme analysis device 21 and delivered as a word string S22. A sentence-structure analysis device 22 analyses the grammatical structure of the word string S22.
The analysis of the sentence structure of the word strings S22 results in a phrase structure tree S23. A semantic analysis device 23 performs an analysis at a semantic level on the basis of the phrase structure tree S23. The semantic analysis device 23 outputs a semantic structure S24 as the result of its semantic analysis. A transfer device 24 receives the semantic structure S24 of the input language (e.g., Japanese) analyzed by the semantic analysis device 23 and transfers it to the semantic structure S25 of the desired language (e.g., English). A sentence structure generating device 25 generates a phrase structure tree S26 from the semantic structure S25 of the desired language. A morpheme generating device 26 generates a translated sentence S27.
It has also been proposed to provide a conventional machine translation system including a generation system which is not separated into the sentence structure generating device 25 and the morpheme generating device 26 unlike the above-described translation apparatus. Such a machine translation system generates the translated sentence S27 directly from the semantic structure S24 without generating the phrase structure tree S26.
FIG. 3 shows in schematic form the display screen of a display (CRT) device (not shown) which is connected to the conventional translation apparatus shown in FIG. 2. The illustrated display screen displays six windows. The six windows display the input sentence S21, the phrase structure tree S23 of the input sentence S21, the semantic structure S24 of the input sentence S21, the semantic structure S25 of the desired language, the phrase structure tree S26 of the desired language, and the output translated sentence S27, respectively, in accordance with the sequence explained with reference to FIG. 2.
The natural language sentence generating device according to the present invention relates to, for example, a sentence structure generating device such as the sentence structure generating device 25 of FIG. 2.
In the field of natural language sentence generating device which are associated with the present invention, conventional sentence structure generating devices have been designed to generate translated sentences as explained in, for example, Nikkei Electronics, Dec. 17, 1984, "Machine Translation System for Multi Language Using Common Sense, Which Utilizes Concept Structure Independent of Language as Intermediate Structure", and Japanese Patent Laid-Open No. 63/136260. Such a conventional sentence structure generating device is arranged to search a dictionary by referring to node names (hereinafter referred to as "words") as key words while following a conceptual structure such as a semantic network, and then to activate sentence generation rules associated with the words on the basis of the result of the search, thereby generating a translated sentence.
The dictionary used in such a conventional sentence structure generating device stores generation symbols some of which indicate the groups of generation rules associated with the words of interest. The generation symbols serve as pointers indicating the groups of generation rules related to the words. The "generation rule" may be regarded as a production rule for providing a word string by examining each of the nodes and arcs present in a semantic network. As shown in FIG. 5, the generation rules may be carefully classified and prepared for each part of speech such as noun, intransitive verb, transitive verb, pronominal objective case and so on. Each group of generation rules includes a plurality of generation rules. The sequence of applying the generation rules is determined in advance so that it determines the order of words.
The following is an explanation of the process of generating the English sentence "He went to Kobe by bus." from the semantic network shown in FIG. 4 by means of the above-described conventional sentence structure generating system. In the semantic network of FIG. 4, the node represented by a double circle indicates that the word "go" is a predicate. Arcs, which have the arc names "AGENT", "GOAL", and "INST", represent deep case relations such as an agentive case, a goal and an instrument, respectively. An arc "PAST" indicates past tense and an arc ST indicates a predicative word which serves as the primary word in the sentence.
FIG. 5 shows an example of a dictionary provided with the generation rules used in the conventional generation system. In the dictionary shown in FIG. 5, "*" means that (1) if "*" is used in a condition field, it indicates that no condition is specified; (2) if "*" is used in an arc name field, it indicates that no arc name is specified; and (3) if "*" is used in a message field, it indicates that there is no message to be output.
Referring to the semantic network of FIG. 4, the process of sentence generation starts with the node "go" to which the arc "ST" extends. A rule interpreter for interpreting the generation rules examines the generation rules associated with "go" one by one. In this case, the generation symbol of "go" indicates "VI", which means that the generation rules of an intransitive verb (VI) are applied in the ascending order from the rule (1) shown in the table "INTRANSITIVE VERB (VI)" of FIG. 5. Thus, the first rule (1) associated with the intransitive verb is applied. The action which is assigned to the rule (1) is "out arc". The term "out arc" means an arc which extends out of the corresponding node. If the action is "out arc", the corresponding arc name indicates the type of the node pointed by the arc. In the rule (1), the type is "AGENT". The message attached to the illustrated node is "SUBJ". Accordingly, the rule (1) will regard as AGENT the node pointed by the out arc and will generate the AGENT as a subject. Then, the process proceeds to a sub-network starting with the out arc AGENT. At this time, a message representing "SUBJ (subject)" is sent to the node "he".
To process the sub-network starting with AGENT, the process proceeds to the node "he". At this time, a flag indicating that the node "go" is being processed is set up. Then, since the generation symbol of "he" is PS (pronominal subjective case), the portion "PRONOMINAL SUBJECTIVE CASE (PS)" of the dictionary of FIG. 5 will be searched so that the generation rules of the pronominal subjective case (PS) will be applied to the processing of the node "he". As described above, the message "SUBJ (subject)" has already been sent to the node "he", and it is checked whether or not this message corresponds to each condition located in the condition field of each rule in the dictionary. In this case, the message "SUBJ (subject)" matches the condition of the rule (5) other than the rules (1) to (4). Accordingly, as shown as "OUTPUT ONESELF" in the column "ACTION" of the generation rule (5), the word "he" itself is output.
The generation rule (5) of FIG. 5 does not indicate generation of a new arc, and the dictionary of FIG. 5 does not include any other generation rule subsequent to the rule (5). Accordingly, the generation of the sub-network of "he" which starts with AGENT ends with the rule (5). The process returns from the processing of the sub-network to the processing of the node "go".
For the processing of the node "go", the rules (2) et seq. of the generation rules of the intransitive verb of FIG. 5 are examined in sequence. None of the generation rules (2), (3) and (4) is applied to this processing because of the structure of the node "go". Since the arc "PAST" which represents the past tense extends out of the node "go", the rule (5) is applied. By the application of the rule (5), the past form "went" of "go" is output. Neither of the rules (6) and (7) is applied. The process proceeds to the rule (8), according to which the processing of the out arc "GOAL" is executed. Since the type of this arc is an out arc, generation of a phrase corresponding to the subnetwork starting with GOAL is started. This phrase generation process follows steps similar to those explained in connection with the out arc AGENT. When the preposition "to" is selected in accordance with the arc GOAL, it is examined whether or not "go" and "Kobe" cooccur with each other via "to". In this manner, "to Kobe" is generated from this sub-network.
Then, according to the rule (9), generation of a phrase corresponding to a sub-network starting with INST is performed, so that "by bus" is generated. Finally, a period (.) is generated in accordance with the in-arc ST, whereby all the processing associated with the node "go" is completed. In this manner, the sentence generation is completed and the English sentence "He went to Kobe by bus." is obtained.
Another conventional sentence structure generating process is described in "Generation of English Sentence from Conceptually Dependent Diagram", 5L-3 of the 28th national meeting of the Institute of Electronics, Information and Communication Engineers. In this conventional sentence structure generating process, a phrase structure is generated from a semantic network by using the following grammatical rules based on improved phrase structure rules.
1. S (NP(A) VP(V*) NP(O)) PA1 2. S (NP(A) VP(V*) INF2(O)) PA1 3. S (NP(A) VP(V(*) NP(R) NP(O)))
These grammatical rules expressed as .alpha.(.beta.1 ... .beta.n) will be explained in brief. .alpha.(.beta.1 ... .beta.n) is a phrase structure rule for converting .alpha. into the sequence .beta.1 ... .beta.n. In the above grammatical rules, "NP" represents a nominal phrase, "VP" a verbal phrase, "A" an agentive case, "O" an objective case, "R" a recipient case, "*" a predicate. In the above expression method, each phrase structure rule and the corresponding case information (semantic information) are described in the same part. In other words, each phrase and its meaning are described in pair in the same part.
However, the above-described conventional natural language processing systems explained in connection with FIGS. 4, 5 and so on, that is, the natural language processing systems of the type which is not provided with a sentence structure generation process, generate a sentence from a semantic network by searching a dictionary by referring to a node name as a key word while following the semantic network, activating the sentence generation rules associated with the desired word, which are stored in a sentence generation rule storage portion, and generating a translated sentence. In this process, as the scale of generation rules increases, it becomes more difficult for operators other than one who creates the sentence generation rules to understand the structure of the sentence generation rules. In addition, the maintenance required to alter the rules becomes more difficult and expandability is limited.
Moreover, the following problems are pointed out. In the above generation systems, the sequence of application of the generation rules determines the order of words. It follows that no grammar (phrase structure rules) appears explicitly. It is therefore impossible to ensure that a generated sentence matches grammar.
The conventional sentence structure generation system provided with the above-described sentence structure generation process involves a number of problems. For example, since case information representing semantic information needs to be mixed with phrase structure rules representing sentence structure rules, it is necessary to write a plurality of identical phrase structure rules in the following manner.
1. S(NP (A) VP(V(*) NP(R) NP(O))) PA0 2. S(NP (A) VP(V(*) NP(0) NP(C)))
I give him a book.
I call him a scholar.
C is a content determiner case.