1. Field of the Invention
The invention relates to natural language processing, commonsense reasoning, and knowledge representation. In particular, the invention relates to the representation of commonsense knowledge and processing mechanisms for the generation of bridging and predictive inferences from natural language text.
2. Description of the Related Art
People are most comfortable communicating in a natural language such as English, yet natural language is notoriously ambiguous and thus difficult for computers to understand. FAUSTUS (Norvig, 1987) is computer program implementing a unified approach to natural language inference. The program uses marker passing to perform six general types of inferences. The algorithm consists of translating input text into a semantic network representation (nodes and links), performing marker passing starting from the nodes of the input network, when a marker collision occurs, suggesting inferences based on the paths taken by the markers, and evaluating the suggested inferences. (e.g., see, Norvig, Peter (1987). Unified theory of inference for text understanding (Report No. UCB/CSD 87/339). Berkeley, Calif.: University of California, Computer Science Division).
Extended WordNet (XWN) (Harabagiu & Moldovan, 1998) is a commonsense knowledge base being constructed by parsing the English glosses (definitions) provided with WordNet, an online lexical database, into directed acyclic graphs. A sample graph is: refrigerator—GLOSSt appliance rLOCATION—store—OBJECTt food which was parsed out of the gloss for refrigerator: “an appliance where food is stored.” (e.g. see, Harabagiu, Sanda M., & Moldovan, Dan I. (1998). Knowledge processing on an extended WordNet. In Fellbaum, Christiane (Ed.), WordNet: An electronic lexical database (pp. 379-405). Cambridge, Mass.: MIT Press. http://www.seas.smu.edu/˜sanda/papers/wnb1.ps.gz).
The Open Mind Common Sense project (Singh, 2002) is building a database of English sentences that describe commonsense knowledge. The sentences are entered by contributors via the Internet. A sample of such sentence contributions is: “One type of book is a calendar book.”; “One of the things you do when you plan a vacation is get out the map.”; “The ice age was long ago.”; “A writer writes for a living.”; “Something that might happen as a consequence of having a heart attack is vice Presidency.”; “A machinist can machine parts.”; and “Walking is for relaxation.” (e.g., see, Singh, Push (2002). The public acquisition of commonsense knowledge. In Proceedings of the AAAI Spring Symposium on Acquiring (and Using) Linguistic (and World) Knowledge for Information Access. Palo Alto, Calif.: American Association for Artificial Intelligence.)
In FAUSTUS, knowledge is represented in a verbose semantic network whose nodes represent concepts. Knowledge is open-ended and coded in an expressive representation language that “encourages a proliferation of concepts” (Norvig, 1987, p. 73). Thus, the problems with FAUSTUS are that knowledge entry is time consuming and that knowledge entry must be performed by knowledge representation experts.
XWN is a knowledge base designed around WordNet glosses. It is a knowledge base of ways of expanding or rewriting concepts. As a result, XWN has significant limitations in what it can represent. XWN does not support representations of plans (Harabagiu & Moldovan, 1998, p. 399), which are essential for natural language understanding and XWN does not support representation of causal rules, which are also essential for natural language understanding. For example, it is difficult to represent as an XWN graph the fact that pouring water on a fire causes the fire to go out. An attempt might be:                pour—OBJECTt water—DESTINATIONt fire        —CAUSEt fire—ATTRIBUTEt extinguished        
However, this fails to capture the fact that both fires are the same. (Furthermore, CAUSE and DESTINATION are not relations derived from the WordNet glosses.) The fire nodes cannot be merged, because the graph would then assert that pouring water on an extinguished fire causes the extinguished fire.
Since Open Mind Common Sense is a collection of English sentences describing commonsense knowledge, the database is potentially relevant to many natural language understanding tasks. The first problem with Open Mind Common Sense is that the sentences are ambiguous as to part of speech and word sense. For example with the sentence “People can pay bills.” it is not specified whether bills is a noun or a verb, and bills is ambiguous as to whether it refers to statutes, invoices, banknotes, beaks, sending an invoice, and so on. The second problem is that the Open Mind Common Sense sentences are ambiguous as to coreference. For example with the sentence “A garbage truck picks up garbage and hauls it to the dump.” it is ambiguous as to whether it refers to garbage truck or garbage. The third problem is that the same type of rule can be expressed in many ways in English, so generation of inferences using English sentences is a difficult problem.