The present invention relates to the field of computer-assisted information and knowledge processing known as "artificial intelligence," and more particularly to a symbolic rule invocation mechanism for an inductive learning engine applicable to domains that use time-based representations such as security auditing.
"Artificial intelligence" is directed to the development of processes and systems using digital computers to mimic the performance of human intelligent behavior. An important goal of artificial intelligence is to give computers the ability to learn from experience, which is one of the most prominent traits of intelligent behavior. This ability enables computers to learn complex concepts that may be difficult for humans to articulate and also to adapt to changing environments. Furthermore, the ability of a computer to learn relieves humans from the tedious task of spoon-feeding computers with the knowledge needed to perform particular functions.
Most current knowledge-based systems use "deductive" inference techniques to acquire knowledge. In these techniques, general principles (or rules in the case of rule-based systems) are acquired from experts and stored in a knowledge base. Knowledge acquired in this manner is generally heuristic, that is, it represents facts and empirical associations, or patterns, that experts in the particular domain have developed through their own experience.
Deductive knowledge-based systems are necessarily limited by the quantity, variety, and accuracy of the knowledge they contain. Thus, a deductive knowledge-based system typically can be used only in connection with circumstances within the domain of the knowledge that system contains. If the system encounters circumstances which are unexpected and outside of that domain, the system will not be able to provide assistance. Furthermore, heuristic knowledge of this type is often of limited utility because it is difficult to verify and update.
Therefore, it is desirable for computers to acquire knowledge directly rather than, or in addition to, acquiring knowledge from experts. One way to acquire knowledge directly is to use "inductive" inference techniques to infer knowledge in a particular domain from examples in the domain. An inductive learning system generates generalized descriptions or patterns (or rules in the case of rule-based systems) from the information provided as domain examples. The examples may be data collected from actual operation in the domain, or from simulation models.
A specific problem in knowledge acquisition is the incremental acquisition of time-based symbolic patterns (or rules) from observations of time-based processes. Observations of time-based processes may take the form of "events" which depict the state of the process at a particular moment in terms of a set of attributes. An "episode" consists of a sequence of events occurring in time.
The goal of inductive learning as applied to time-based processes is to discover patterns (represented in the form of symbolic rules) which can be used to characterize processes or for prediction of subsequent events. Thus, each rule may describe a sequential pattern that predicts the next possible events with reasonable accuracy.
An example of a simplified rule characterizing a sequential pattern is as follows: EQU e.sub.1 -e.sub.2 -e.sub.3 .fwdarw. (e.sub.4 =95%; e.sub.5 =5%)
This rule indicates that if event e.sub.1 is followed by event e.sub.2, and event e.sub.2 is followed by event e.sub.3, then there is a 95% chance based on the previous observations that event e.sub.4 will follow, and a 5% chance that event e.sub.5 will follow.
the following is a simplified example which demonstrates the process of generating a set of hypotheses (i.e., patterns represented in the form of symbolic rules) from a sequence of events. Assuming that rules are to be discovered from the following episode where each letter represents an event in the episode: ##STR1## the following hypotheses may be generated: EQU h.sub.1 : A-B.fwdarw. (C=100%) EQU H.sub.2 : C.fwdarw. (S=50%; A=50%) EQU h.sub.3 : S.fwdarw. (T=100%) EQU h.sub.4 : T.fwdarw. (A=50%; S=50%)
The hypotheses h.sub.1, h.sub.2, h.sub.3, and h.sub.4 may be viewed as alternative explanations that uncover what is happening in the process.
In incremental inductive learning, the rules are maintained and modified dynamically as new events occur. The capability to learn incrementally is important because it makes use of the knowledge acquired from previous events. Otherwise, the learning process is very inefficient because it must learn a concept from scratch each time a new event or episode occurs.
Hypotheses should be generated or modified so that eventually only high quality rules are left in the rule base. A high quality rule or hypothesis has, among others, the following properties: (1) high level of confidence, and (2) high accuracy in prediction.
A high level of confidence is achieved by rules that cover as many data points as possible. Thus, the more times a rule is tested against new data points or events in the sequence and is proved to be valid, the higher the confidence level. Such rules represent patterns that are highly repetitive.
Accuracy in prediction is expressed in the form of "entropy," which measures the degree of randomness in the prediction when a rule is matched against known data points. The entropy of a rul is defined as follows: ##EQU1## where P.sub.i is the probability that event i will occur under the conditions specified in the rule.
FIG. 1 shows how a "good" pattern in terms of quality is found by minimizing the entropy value and maximizing the coverage. The "good" pattern covers or explains one type of data point represented by a "-" and therefore has a very low entropy. It also has maximum coverage because it explains all of the "-" data points. The "bad" pattern, on the other hand, has a higher degree of randomness in its prediction and therefore a high entropy. Since it covers nearly equal numbers of "-" and "+" data points, its usefulness in prediction is limited.
By definition, a rule with a skewed probability distribution in its right hand side (i.e., its set of possible outcomes) has a low entropy. Thus, rule r.sub.1 in the following example, with a skewed probability distribution, has a much lower entropy than rule r.sub.2 : EQU r.sub.1 : e.sub.4 -e.sub.2 .fwdarw. (e.sub.3 =95%; e.sub.4 =2%; e.sub.5 =3%) EQU r.sub.2 : e.sub.6 =e.sub.7 .fwdarw. (e.sub.8 =30%; e.sub.9 =30%; e.sub.10 =40%)
Theoretically, the lowering of entropy amounts to the gaining of information, because from a set of possible outcomes having a low entropy, one can learn that one outcome is more likely. Thus, the whole process of incremental inductive learning can be viewed as an attempt to extract as much information as possible from the seemingly random data.
FIG. 2 is a simplified block diagram of a system for incremental inductive learning. Newly occurring events are input to an inductive learning engine 10. The inductive learning engine is embodied in a computer program operating on a digital computer. The rule base 12 contains symbolic rules of the form such as in the examples above and is stored in the memory of the digital computer. The inductive leaning engine maintains and dynamically modifies the rules in the rule base in response to the new events.
The inductive learning engine may also use certain background knowledge 14 applicable to the particular domain to evaluate and modify the rules in the rule base. The background knowledge may contain, for example, constructive induction rules that generate new attributes not present in the initial data. Background knowledge may also include models of the domain, which may aid the inductive learning engine in generating plausible hypotheses.
As inductive learning engine of the type shown in FIG. 2 is described in K. Chen, "An Inductive Engine for the Acquisition of Temporal Knowledge," Doctoral Thesis, Department of Computer Science, University of Illinois (1988), which is hereby incorporated by reference.
One of the functions performed by the inductive learning engine is determining which rules in the rule base 12 are affected by the new events and also have the greatest possibility of being further improved by the new events. This is known as "rule invocation."
Previously, it has been proposed to invoke all of the rules in the rule base to determine which rules best characterize the sequence of events including the new events. For a large rule base, however, exhaustive rule invocation is impractical, especially for real-time applications. Rule invocation involves pattern-matching between rules and events which is an expensive process in terms of computer resources.
Alternatively, a subset of the rules in the rule base may be invoked based on predetermined criteria, such as, for example, selecting only a certain number of the "best" rules in terms of quality. Even this technique has disadvantages, however, because the expensive rule-to-event pattern-matching must still be performed for all of the rules in the subset. In order to maintain the quality of the rule base, the number of rules in the subset must generally be sufficiently large that real-time processing is still difficult or impossible.