In recent years, attention has been placed on textual entailment recognition in order to achieve sophistication of the large number of applications for handling natural languages, such as retrieval of natural sentences having meaning. The textual entailment recognition refers to a task of determining, when a text T and a text H are given, whether “the text H can be inferred from the text T”.
For example, when “text T: Company N has profits of ¥50 billion” and “text H: Company N is in the black” are given, the text H can be inferred from the text T and it is determined that the text T entails the text H. The reason is that “profits of ¥50 billion” is considered to be sufficient to mean “in the black”.
For example, Non-Patent Document 1 discloses an example of a conventional textual entailment recognition system. The textual entailment recognition system disclosed in Non-Patent Document 1 first parses each of the text T and the text H, and creates, for each of the texts, a tree structure in which a verb is a root (top node) and words included in an argument (subject, object, etc.) of the verb are child nodes or grandchild nodes.
Next, the entailment determination system disclosed in Non-Patent Document 1 carries out word replacement and syntactic paraphrasing on the text T, and tries to create, in a subtree of the text T, a tree structure that matches the tree structure of the text H. Then, the entailment determination system determines that the text T entails the text H if the aforementioned tree structure can be created in the subtree of the text T.
Further, with the entailment determination system disclosed in Non-Patent Document 1, it is also possible to perform approximate matching determination when the matching determination is performed on the tree structure, in addition to matching determination as to whether there is a perfect match. Specifically, after creating the above-described tree structure, the entailment determination system creates data called a VAS (Verb-Argument Structure) from the created tree structure.
VAS is a kind of so-called predicate-argument structure, and is composed of a verb serving as a root of a tree structure and a word set separately created for each argument type. For example, in Non-Patent Document 1, the VAS “<kill, (object: Casey, Sheehan), (other: Iraq)>” is generated from the sentence “Casey Sheehan was killed in Iraq”. Non-Patent Document 1 also discloses a method for creating a word set from the entirety of child nodes and grandchild nodes if the root is the verb “be”, without differentiating between the argument types.
Then, the entailment determination system disclosed in Non-Patent Document 1 determines the percentage of word coverage between the word sets of the same argument for two VASs that were created from the text T and the text H. Subsequently, the entailment determination system determines that the argument contents of the two VASs match if the aforementioned percentage of word coverage is greater than or equal to a predetermined coverage, and furthermore determines that the original tree structures of the two VASs also match if the degree of matching arguments is greater than or equal to a fixed rate. Thus, in the case of generating VASs, not only perfect matching but also approximate matching between the character strings of the arguments of verbs can also be determined.