Technologies referred to as textual entailment recognition have been known in recent years. The textual entailment recognition is a technology of determining whether or not, with regard to two statements (sentences), one sentence (target sentence) includes a meaning represented by the other sentence (hypothetical sentence). In other words, the textual entailment recognition is considered to be a technology of recognizing (determining) an entailment relationship between two or more sentences.
In relation to such textual entailment recognition, for example, the following technologies are known.
NPL 1 discloses various technologies (techniques) proposed in relation to textual entailment recognition. In most of the techniques disclosed in NPL 1, the coverage rate of a character or a word is used to determine an entailment relationship between text. The coverage rate is the rate of an element (for example, word, clause, or the like) in a hypothetical sentence that is common to a target sentence. The coverage rate represents the possibility of entailment in a vocabulary. Therefore, the coverage rate is information (feature amount) that is commonly used in determination in the textual entailment recognition.
NPL 2 discloses an example of a technology of determining an entailment relationship between a target sentence and a hypothetical sentence. The technology disclosed in NPL 2 converts a target sentence and a hypothetical sentence into tree structures that represent dependency structures, respectively. This technology determines the entailment relationship between the target sentence and the hypothetical sentence on the basis of the rate of a subtree common to the tree structure that represents the target sentence with respect to the tree structure that represents the converted hypothetical sentence.
NPL 3 discloses another example of a technology of determining an entailment relationship between a target sentence and a hypothetical sentence. The technology disclosed in NPL 3 converts a target sentence and a hypothetical sentence into tree structures that represent dependency structures, respectively. This technology determines the entailment relationship between the target sentence and the hypothetical sentence on the basis of the number of editing operations executed when the tree structure that represents the target sentence is transformed into the tree structure that represents the hypothetical sentence. The number of the editing operations described above represents the number of operations of editing the tree structures, such as insertion, deletion, and replacement of each node constituting the tree structures, and movement of a subtree. Specifically, the technology disclosed in NPL 3 determines the entailment relationship between the target sentence and the hypothetical sentence by using a feature amount (i.e., state (degree) of difference between tree structure that represents hypothetical sentence and tree structure that represents target sentence) represented by the number of the editing operations.
The above may be summarized as follows: most of the technologies disclosed in NPL 1 are technologies of determining an entailment relationship by using a coverage rate as one of feature amounts. The technology disclosed in NPL 2 or PTL 3 is a technology to focus on whether a common node or subtree is included in a tree structure that represents a target sentence and a tree structure that represents a hypothetical sentence, or whether a node or subtree being not common is included in a tree structure that represents a target sentence and a tree structure that represents a hypothetical sentence, and to determine an entailment relationship by using such a node or subtree as a feature amount.
Hereinafter, a node and a subtree in a tree structure may be collectively referred to as “substructure”.
For example, the following PTLs are disclosed as technologies of focusing on a relationship between plural texts.
PTL 1 discloses a technology of generating a new text on the basis of plural texts collected in advance. The technology disclosed in PTL 1 collects aggregated data including a pair of a text and an intention represented by the text. Such a technology makes hierarchies of plural intentions on the basis of a relationship between the intentions. Such a technology uses plural texts associated with the hierarchized intentions to combine a match portion and a unmatch portion of morphemes that constitute the texts, thereby generating a new text.
PTL 2 discloses a technology of generating a rule used in classification of a structured document such as XML (Extensible Markup Language). The technology disclosed in PTL 2 defines feature values relating to fluctuation portions in the schema of a structured document (element and attribute of structured document). The technology disclosed in PTL 2 generates a rule used in classification of plural structured documents on the basis of feature values obtained from the structured documents. Such a technology determines similarity between the structured documents on the basis of the generated rule.
PTL 3 discloses a technology of determining the similarity of description data (document or source code or the like) described based on a particular rule. The technology disclosed in PTL 3 converts plural pieces of description data described based on the particular rule (grammar or the like) into a description format such as a parse tree. Such a technology prunes the parse tree at a specific level, thereby fragmenting the parse tree into subtrees. Such a technology determines similarity between the pieces of description data by determining the similarity of the combinations of the corresponding subtrees between the plural pieces of description data.
PTL 4 discloses a technology of extracting a synonymous representation from a pair of sentences similar to each other. The technology disclosed in PTL 4 executes dependency analysis of each sentence of the pair of similar sentences. Such a technology extracts a common representation included in each sentence in common, and a different representation included only in either of the sentences on the basis of the results of the dependency analysis. The technology disclosed in PTL 4 extracts the synonymous representation on the basis of the similarity of the relative positions of the common representation arranged in the respective sentences, and the similarity of the relative positions of the different representation and the common representation.