It is known to parse natural language text sentences into trees and sub-trees where the different parts (and sub-parts) of a sentence (herein called grammatical portions) respectively correspond to a set of hierarchically-related nodes. In this context, frequent sub-trees are trees that tend to occur frequently in text (that is, spoken text, written text, etc.). A “maximally frequent sub-tree” is a “frequent sub-tree” that has no proper supertrees which are also frequent.
The VerbNet project maps PropBank verb types to their corresponding Levin classes. VerbNet is a lexical resource that incorporates both semantic and syntactic information about language. VerbNet is part of the SemLink project in development at the University of Colorado. VerbNet includes the following: (i) a set of usage patterns; (ii) a set of exemplary natural language sentences for each usage pattern; and (iii) assignments of “thematic roles” for at least some of the phrases (that is, a word or short string of consecutive words) in each exemplary natural language sentence. The thematic roles are the arguments of verbs. Just like a method in a programming language, a verb is associated with arguments, each has a meaningful type. The common thematic roles are “agent” (subject; who is driving the action?), “theme” (object; what is the target of the operation), recipient (who is the target of path verbs like give send, receive, deliver, etc.). As an example, “I gave my friend an apple” has the thematic roles “agent” (I), “theme” (“an apple”) and “recipient” (“my friend”).
Semantic parsing is a known technique which is herein defined as associating parts of a given natural-language text with semantic concepts and/or relationships of a predefined domain and/or schema. One known type of semantic parsing is associating verbs with their arguments. In turn, this type of semantic parsing can enhance various text-analysis tasks, such as extracting SVO (subject-verb-object triplet) relations from text. The VerbNet corpus is known corpus that contains a thorough list of English language verbs, classified to verb classes, where a class is associated with different thematic roles. A class is also associated with different shallow patterns (for example, NP V NP NP” as in “I gave her a gift,” where NP stands for Noun Phrase and V stands for verb) which will herein be referred to as “VerbNet patterns.” In a VerbNet pattern each item of the pattern is respectively assigned a unique thematic role (for example, NP1=Sender, NP2=Recipient, NP3=Target). However, due to grammatical variations in the language, it is often not easy to correctly detect the pattern inherent in a given sentence. An example is the pattern NP V NP NP, which is inherent in the sentence “I will visit her tomorrow and give her this lovely gift.”
VerbNet patterns are herein referred to as “shallow patterns” because the pre-existing patterns that are in the VerbNet corpus only reflect a two level hierarchy, as follows: (i) sentence level (also called root level); and (ii) grammatical portions level (for example, leaf nodes of NP, V, NP and NP for the “I gave her a gift” example used above).
In this document, the term “language net” will be used to generically refer to any software-based lexical resource that includes: (i) a set of usage patterns; (ii) a set of exemplary natural language sentences for each usage pattern; and (iii) assignments of “thematic roles” for at least some of the phrases (that is, a word or short string of consecutive words) in each exemplary natural language sentence. The VerbNet corpus is one example of a “language net.”