A semantic network is a structure for representing information. Semantic networks can encode information about a business, an area of study, or common sense knowledge. For example, a semantic network could encode information about a bank's products and services.
The basic building blocks of a semantic network are links and nodes. A node typically represents a concept or set of concepts, or an object or set of objects. For example, a node in a semantic network encoding information about a bank's products could represent the class of all checking accounts, while another node could represent the price (fee) of a particular class of products such as checking accounts. Abstract concepts are often referred to as properties or attributes.
A link relates two nodes. A link may represent any binary relation, that is, any relation that connects two objects or concepts or an object and a concept. For example, in the example above, one could specify a link representing a “has feature” relation, between a node representing a subclass of checking accounts and a node representing a customer's ability to perform transactions over the internet.
Proponents of semantic networks have suggested that semantic networks can be used not only to represent information but also to reason with that information. (See A. Barr and E. Feigenbaum (eds.): AI Handbook, volume I, pages 80-89, which is herein incorporated by reference in its entirety.) For example, consider FIG. 1. FIG. 1 represents a semantic network that encodes some information about the animal kingdom (pandas are mammals; cats are mammals, mammals are lactating creatures) as well as some information about Bill's visit to the Bronx Zoo. This network encodes the information that a person named Bill sees an individual named Penny and that Penny is a panda. If one conjoins this information in the network with the information in the network that pandas are a subclass of mammals, one should then be able to conclude that Bill sees a mammal. Indeed, this piece of reasoning in a semantic network corresponds to following a path in the semantic network. In FIG. 2, the path corresponding to this reasoning is outlined in bold.
Researchers have noted, however, that one can easily misuse a semantic network to make invalid conclusions while reasoning. (See, for example, W. Woods, “What's in a Link” in D. Bobrow and A. Collins: Representation and Understanding, Morgan Kaufmann, San Francisco 1975, which discusses a variety of problems that arise when naively reasoning with a semantic network. This reference is herein incorporated by reference in its entirety.) Indeed, it is not even clear how a path in a semantic network is supposed to correspond to reasoning with the information in that semantic network. For example, consider FIG. 3, which depicts the semantic network of FIG. 1, with another path outlined in bold. Although we have the information that Bill sees Penny, and Penny is a panda, and pandas eat bamboo and bamboo is a type of grass, one cannot conclude that Bill sees Penny eating bamboo, or that Bill sees grass.
That is, if one simply takes an arbitrary path in a semantic network, and extrapolates that a link in a segment of the path corresponds to a larger portion of the path, one can easily jump to conclusions that are not sound.
There have been a variety of attempts to explore how one can reason soundly within a semantic network. In particular, researchers have studied how one can reason within a subtype of semantic networks known as inheritance networks.
Inheritance networks focus on two link types of interest, known as the isa and inst links. The isa link connects nodes A and B if the class (set) of objects that A represents is a subset of the set of objects that B represents. For example, if node A represents the set of all Coverdell accounts and node B represents the set of all college savings accounts, then A isa B (since all Coverdell accounts are used to save for college). The inst link connects nodes A and B if the objects that A represents is a member of the set of objects that B represents. For example, if A represents the checking account 226070584-404-9962899 and B represents the set of all checking accounts in Apple Bank, then A inst B (commonly read as “A is an instance of B”).
The isa and inst links are generally used to describe taxonomic hierarchies. Classic examples of taxonomic hierarchies are the plan and animal kingdoms, small fragments of which appear in FIG. 1. There are many other examples in all sorts of aspects of everyday life and business applications. For example, a bank's products can be viewed as a taxonomic hierarchy; e.g., different types of free checking accounts form a subclass of checking accounts, which form a subclass of bank accounts. Many taxonomic hierarchies allow for classes to have multiple superclasses. For example, interest-bearing checking accounts have both interest-bearing accounts and checking accounts as superclasses.
Reasoning in a pure taxonomic hierarchy is simple. One can define paths in the following manner: If one considers an inheritance hierarchy as a directed acyclic graph (DAG), then in most inheritance hierarchies, A is a leaf of the DAG if there is an inst link between A and B. A is a parent of B if there is an inst or isa link from B to A.
Ancestor is defined recursively as follows: A is an ancestor of B if A is a parent of B; A is an ancestor of B if there is some node C such that A is an ancestor of C and C is a parent of B. A is a root node if A has no ancestors.
One can now elaborate the notion of a path in an inheritance hierarchy. There is a path between A and B (written A→B) if one of the following conditions holds: (i) There is an inst or isa link between A and B, or (ii) There is some node X such that A→X and there is an inst or isa link between X and B. If there is a path between A and B, we can say that A is a member or a subclass of B.
Researchers have studied a variant of inheritance networks known as inheritance networks with exceptions (IHE). In a classic IHE, there are three link types of importance: the inst link, the defeasible isa link, and the defeasible cancels link. The inst link is the same type of link as in classic inheritance networks. Intuitively, there is a defeasible isa link between nodes A and B if the class of objects that A represents is “close-to” a subset of the set of objects that B represents. That is, if x is a member of the class A, then it is typically the case that x is a member of the class B. In the same manner, there is a defeasible cancels link between A and B if members of the class A are typically not members of the class B. These links are important because they allow specifying and reasoning with exceptions.
In an inheritance hierarchy with exceptions, one wishes to reason about whether or not the members of a class X typically are or are not members of a class Y. To do that, one must determine whether or not there is a positive or negative path between X and Y. This question has been studied in depth by the following references, among others: John F. Horty, Richmond H. Thomason, David S. Touretzky: A Skeptical Theory of Inheritance in Nonmonotonic Semantic Networks. Artificial Intelligence 42(2-3): 311-348 (1990), Lynn Andrea Stein: Resolving Ambiguity in Nonmonotonic Inheritance Hierarchies. Artificial Intelligence 55(2): 259-310 (1992). These references are herein incorporated by reference in their entirety.
Inheritance networks and inheritance networks with exceptions allow a very limited type of reasoning: determining whether X is a member or subclass of Y or whether members of a class X typically are or are not members of a class Y. There have been various attempts in the prior art to broaden the types of reasoning that is allowed within a semantic network: By clever choices in representation, one can use inheritance hierarchies and inheritance hierarchies with exceptions to reason about whether or not an object has a certain property. One can do this by reifying a property as a class—that is, the class of objects that have the property in question. For example, to represent the fact that cars have 4 wheels, one can create two nodes, a node representing the class of all cars, and a node representing the class of all things that have 4 wheels, and then drawing an isa link between the first and second nodes. In such a manner, one could, for example, construct a semantic network that allows one to reason that all Volvo station wagons and all Buick coupes have 4 wheels. More generally, one can reason about the “slots” that an object or class of objects can have and the “fillers” for these slots. For example, one can reason about the price, cylinders, options, and other properties of cars. This technique is used to represent information in such languages as KL-ONE, which is based on the concept of a classic inheritance network. There has been inquiry into a class of semantic networks known as description logics, which use this technique extensively. The bottom line, however, is that such semantic networks still allow only very limited reasoning. They are designed to answer two types of questions: “Is class A a member of class B?” (the subsumption question) and “Where in a semantic network does a particular class A belong?” (the classification question). These are not general semantic networks; they are inheritance networks, and they do not allow general reasoning.
Morgenstern has investigated inheritance networks with exceptions in which logical formulas are attached to nodes. These logical formulas can be thought of as representing rules. Intuitively, a formula p is attached to a node A if it is the case that the formula p is typically true at the state of affairs represented by node A. For example, for an inheritance hierarchy representing reimbursement for medical insurance purposes, one might have a node A representing the class of all surgical procedures and a formula p saying that 90% of the cost of surgical procedures is covered. This means that typically, 90% of the cost of a surgical procedure is covered. However, there may be exceptions: emergency surgery may be covered in full, while cosmetic surgery may not be covered at all. Morgenstern's work focuses on determining what sets of formulas apply (can be considered true) at a particular node of the network. The work applies to inheritance networks with exceptions, but not to general semantic networks. General reasoning is not considered. Norvig has examined the problem of trying to understand a story using a semantic network. He has developed a system that processes a story and constructs an ad-hoc semantic network which represents information in that story. He then identifies path shapes which correspond to syntactical and semantical natural language operations. A path shape can become a candidate for a potential inference. Off-line techniques which do not refer back to the semantic network are then used to determine which of these potential inferences can be made safely. The method is neither sound (that is, paths corresponding to incorrect inferences are identified, as in the example of FIG. 3), nor general. See: R. Brachman and H. Levesque: The Tractability of Subsumption in Frame-Based Description Languages, Proceedings of the National Conference on Artificial Intelligence, 1984, 34-37. R. Brachman et al.: The CLASSIC Knowledge Representation System or, KL-ONE: The Next Generation. FGCS 1992: 1036-1043, L. Morgenstern: Inheritance Comes of Age: Applying Nonmonotonic Techniques to Problems in Industry, Artificial Intelligence, 103(1-2), 237-271 (1998), L. Morgenstern, IBM Patent: U.S. Pat. No. 5,802,508, Sep. 1, 1998: Reasoning with rules in a multiple inheritance semantic network with exceptions P. Norvig: Marker Passing as a Weak Method for Text Inferencing, Cognitive Science, 13 (4), 569-620 (1989), and R. Brachman and J. Schmolze: An Overview of the KL-ONE Knowledge Representation System, Cognitive Science 9 (2), 171-216(1985). These references are herein incorporated by reference in their entirety.
Businesses often need to have some method to recommend products or services to their customers. For example, a bookstore might wish for a way to determine which books to recommend to its customers. In e-commerce applications, in which there is little or no personal interaction between the enterprise and the customer, such a system is particularly important. An automated system that can make such recommendations is known as a recommendation system.
Most recommendation systems work on a principle known as collaborative filtering. The idea of collaborative filtering is that one can assign an individual to a particular group based on his preferences—which can be elicited by direct questioning, or inferred by observing a customer's purchase or browsing behavior—and then determine which products or services might suit a customer by looking at the purchasing patterns of other members in the group.
An example of a recommendation system using collaborative filtering is the one used by amazon.com. If a user searches for a particular book, the system will suggest other books purchased by customers who were also interested in the book for which the user searched. For example, if one searches for Michael Shaara's The Killer Angels (a Pulitzer-Prize-winning account of the Battle of Gettysburg), amazon.com will suggest books by, among others, Joshua Lawrence Chamberlain, Shelby Foote, and Bernard Malamud. Some of these recommendations are closely related to the original request. For example, Joshua Lawrence Chamberlain wrote memoirs about his experiences in the Civil War; similarly, Shelby Foote writes about the Civil War. However, some of these recommendations—e.g. Malamud—do not seem to match well. It is important to note that collaborative filtering offers no way of explaining its recommendations; all a system can say is that other customers, grouped according to some clustering algorithm, showed interest in or purchased some item. See J. Breese, D. Heckerman, C. Kadie: Empirical Analysis of Predictive Algorithms for Collaborative Filtering, Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, 1998, Morgan Kaufmann, San Francisco. This reference is herein incorporated by reference in its entirety.
Some recommendation systems are rule based, with rules coming from static customer profiles and transactional data (e.g. Broadvision at their website). A business manager can set up rules to recommend products based on conditions he chooses. For example, a business manager could set up rules saying that if a customer takes out a mortgage, one should recommend home insurance for that customer or that if a customer is affluent, one should recommend various tax shelters. Such rule-based systems, however, are often hard to maintain and update (as opposed to model-based systems). Rules may interact with one another in unexpected ways, sometimes even producing inconsistency. Updating one rule may require updating many other rules.
It has long been recognized that while classical logic is binary—statements are either true or false—real life reasoning is many-valued. That is, we reason that a statement is probably true, or unlikely, or true with a certain probability or certainty. Such considerations led to the development of probabilistic reasoning methods in computer systems, starting in the 1960s and 1970s. A well-known example is Shortliffe's MYCIN system, which used certainty factors to facilitate the diagnosis and treatment of bacterial infections.
A more formal treatment of probabilistic reasoning became popular in the 1980s. We begin by introducing several basic concepts. The prior probability of a statement, event, or hypothesis H, P(H) is the probability that H is true. The posterior or conditional probability of a statement or hypothesis H relative to some evidence E, P(H|E), is the probability that H is true given that E is already known to be true. If P(H|E)=P(H) we say that H and E are independent. If P(H|E1; E2)=P(H|E2) we say that H and E1 are conditionally independent given E2. The joint probability of H1 and H2, P(H1; H2), is the probability that both H1 and H2 are true at the same time. The chain rule relates these concepts. Specifically, the chain rule states that P(H1; H2; . . . Hn)=P(Hn|Hn−1 . . . H1) . . . P(H2|H1) P(H1). A corollary of this rule is Bayes's Rule:P(H|E)=P(E|H)P(H)/P(E).
The chain rule and Bayes's rule allow the determination of certain conditional probabilities from prior probabilities and other conditional probabilities. It is often the case that the particular calculation of a conditional probability is simplified due to the independence or conditional independence of some of the variables. The conditions under which calculations are simplified can often be represented in an intuitive way in a graphical structure.
A Bayesian network is a graphical representation of events and prior and conditional probabilities. Such networks have become an increasingly popular way of implementing probabilistic reasoning. A Bayesian network comprises nodes and links; nodes represent variables, and the links between nodes represent an explicit conditional dependence between variables. Some prior and conditional probabilities are furnished. Variables can represent facts or events. For example, the Bayesian network in FIG. 4 contains nodes representing such states as the sprinkler being on, rain, the presence of El Nino, the pavement being wet, and the grass being wet. The Bayesian network in FIG. 4 contains links, including a link between El Nino and rain, a link between rain and wet grass, a link between sprinkler and wet grass, and a link between sprinkler and wet hose. The Bayesian network in FIG. 4 contains assigned probabilities, including prior probabilities on some of the nodes (e.g., a prior probability of 5% on Cloud Seeding, 20% on El Nino) and conditional probabilities on some of the links (e.g., a conditional probability of 40% of Rain given El Nino).
This network can be used to infer a range of conditional probabilities: e.g., the probability that it had rained given (1) that the grass was wet, or (2) that the grass was wet but the pavement was dry or (3) that the grass was wet and el nino was present. Research has investigated methods to perform such inference with relative efficiency.
Bayesian networks are a powerful method of reasoning with probabilities. However, Bayesian networks have limited representational power. In particular, there is no semantics on the links between nodes, other than conditional probabilities. There are a few limited exceptions. For example, if the conditional probability between two nodes is sufficiently high—that is, if P(A|B) passes a certain threshold—it may be reasonable to say that B causes A. However, Bayesian networks do not in general allow assigning user-defined semantics to links between nodes. See E. Charniak: Bayesian Networks Without Tears, AI Magazine, 12(4), 50-63, 1991. This reference is herein incorporated by reference in its entirety.
The following useful theoretical concepts will assist in the understanding of the invention. A regular expression is an algebraic formula whose value is a pattern comprising a set of strings. This set of strings is called the language of the regular expression. Such a language is called a regular language.
Regular expressions can be characterized in a variety of ways. Most simply, a regular expression can be characterized in terms of its formation rules. Assume an alphabet A of symbols. The regular expressions over A are defined recursively as follows: The empty set is a regular expression; The empty string is a regular expression; For each symbol a in A, {a} is a regular expression; If x and y are regular expressions, then x|y is a regular expression; If x and y are regular expressions, then xy is a regular expression; and If x is a regular expression than x* is a regular expression. Examples of regular expressions are (for the English alphabet) the set of all words having 2 a's , or ending in x, or (for the English alphabet plus the digits 0-9) the set of all Pascal or Java identifiers. Regular expressions are equivalent in expressive power to finite automata. More precisely, a language can be generated by a regular expression if and only if it can be accepted by a finite automaton. See: J. Hopcroft and J. Ullman, 1979: Introduction to Automata Theory Languages and Computation, Addison Wesley, Reading, Mass.pp.28-35, 55-76, 350-353. This reference is herein incorporated by reference in its entirety.
A well-formed formula (wff) or logical formula is defined in the following manner:    1. ˜(“not”) and v (“or”) are basic logical constants,    2. A term is defined as            a. a non-logical constant        b. f(t1, . . . , tn) where f is an n-ary function and each ti is a term,            3. An atomic formula is an expression comprising either of a propositional constant, or of the form P(s1, . . . , sn) where P is a n-ary predicate and each si is a term,    4. A well-formed formula is either an atomic formula or is built up from one or more atomic formulas by a finite number of applications of the following rules:    (i) If p is a well-formed formula then ˜p is a formula    (ii) If p and q are well-formed formulas, then p v q is a well-formed formula    (iii) If p is a well-formed formula and x is a variable, then (∀x)p and (∃x) p are well-formed formulas.
Well-formed formulas are often represented as if-then rules, but need not be. They can be used to define concepts, to give necessary and/or sufficient conditions, or to provide information. See: B. Mates, Elementary Logic, Second Edition, Oxford University Press, 1979, Chapter 3. This reference is herein incorporated by reference in its entirety.
There are various problems and issues that the prior art does not address. For example, the prior art does not solve the problem of determining sound (correct) logical inferences within a semantic network.
The prior art cannot express or reason with semantic networks in which rules are attached to links and nodes of the network.
The prior art does not recognize that rules attached to nodes in semantic networks can be categorized as definition, prerequisite, or auxiliary rules, each type being interpreted differently when reasoning in the network.
The prior art does not recognize that the inferences that are performed within a semantic network must be valid with respect to a particular context.
The prior art cannot express or reason with semantic networks in which weights (probabilities) are attached to formulas on links and nodes in the network.
The prior art cannot furnish explanations for inferences within a semantic network in which rules are attached to links and nodes in the network.
What is needed is a method incorporating knowledge structures for reasoning about concepts, relations, and rules that addresses the above problems.