Predicting occurrences of events in sequential data is an issue in temporal data mining. Large amounts of sequential data are gathered in various domains such as biology, manufacturing, network usage, finance, etc. As such, the reliable prediction of future occurrences of events in sequential data may have applications in such domains. For example, predicting major breakdowns from a sequence of faults logged in a manufacturing plant may help to avoid future breakdowns, and therefore improve plant throughput.
One approach to predicting occurrences of future events in categorical sequences uses rule-based methods. For example, rule-based methods have been used for predicting rare events in event sequences with application to detecting system failures in computer networks. In such an approach, “positive” and “negative” data sets are created using windows that precede a target event and windows that do not, respectively. Frequent patterns of unordered events are discovered separately in both data sets and confidence-ranked collections of positive-class and negative-class patterns are used to generate classification rules.
Another rule-based method, called a genetic algorithm-based approach, is used to predict rare events, and begins by defining “predictive patterns” as sequences of events with some temporal constraints between connected events. A genetic algorithm is then used to identify a diverse set of predictive patterns, where a disjunction of such patterns constitutes the classification rules. Yet another rule-based method learns multiple sequence generation rules to infer properties, in the form of rules, about future events in the sequence.