The present disclosure relates generally to a tool, method and product for forecasting an event of interest, and particularly to a method for forecasting an event that has a low frequency of occurrence, but a high impact or cost upon its occurrence.
Learning to predict infrequent but correlated sub-sequences of events is a difficult problem. There are several real world problems that can be categorized in this manner such as attacks in computer networks, fraudulent transactions in a financial institution and prediction of machine downtime in manufacturing assembly lines. Common factors that make it difficult to learn to recognize these events include: few examples of the target class to be learned; limited data samples; events occur at uneven inter-arrival times; and, time recordings and duration of measurable events only approximate their true values.
Event classification algorithms typically follow a discriminant description strategy wherein the discriminant boundaries that separate the regions of the class are estimated from data. In contrast to these methodologies, it would be advantageous to have a methodology where the data is not only temporal in nature, but is also based on a characteristic description strategy wherein the target events are first identified/characterized (such as events that occur rarely and have a large impact upon occurrence) and then validated against the negative class (that is, event classes that are not rare or do not have large impacts or costs). These validations could then be extracted as rules for classifying the data. The process of integrating classification and rule extraction and association is well studied in literature (see: K. Ali, S. Manganaris and R. Srikant, “Partial classification using associative rules,” ACM Sigmoid Management of Data, pp. 115-118, 1997; R. Bayardo, “Brute-force mining of high confidence classification rules,” Proc. of Third International Conference on Knowledge Discovery and Data Mining, pp. 123-126, 1997; D. Meretakis and B. Wuthrich, “Classification as mining and use of labeled item sets,” ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD-99), 1999; W. Pijls and R. Potharst, “Classification and target group selection based upon frequent patterns,” Proc. of Twelfth Belgium-Netherlands Artificial Intelligence Conference (BNAIC00), pp. 125-132, 2000; G. Dong, X. Zhang, L. Wong and J. Li, “Caep: Classification by aggregating emerging patterns,” Proc. of International Conference on Discovery Science, 1999; and, B. Liu, W. Hsu and Y. Ma, “Integrating classification and association rule mining,” Proc. of Fourth International Conference on Knowledge Discovery and Data Mining (KDD-95), 1995, for example). However, these approaches mine for more than just the target/large events, thereby making the storage and search of events inefficient.
Other literary works, such as: G. Weiss and H. Hirsh, “Learning to Predict Rare Events in Event Sequences,” Knowledge Discovery and Data Mining, pp. 359-363, 1998; for example, relate to mining for target events and predicting its occurrence along an event sequence. However, these methods only identify time windows that are predictive of target/large events, not target event sets or sub-sequences that are constructed entirely from the negative class but are predictive of the positive class, or large event.
Yet other literary works, such as: R. Agarwal and R. Srikant, “Mining Sequential Patterns,” Proc. of 11th International Conference on Data Engineering, ICDE, pp. 3-14, 1995; H. Mannila, H. Toivonen and A. I. Verkamo, “Discovering frequent episodes in sequences,” Proc. of International Conference on Knowledge Discovery and Data Mining (KDD-95), 1995; for example, consider the temporal distribution of the negative class within a time window, which may result in an overly limiting methodology.
Further literary work, such as: R. Vilata and S. Ma, “Predicting Rare Event In Temporal Domains,” International Conference on Data Mining, 2002, for example, estimate the size of time window by trial and error, require pre-labeled data wherein each data point belongs either to one of the negative class examples or a positive class example, and assume a fixed set of negative class types.
Accordingly, there is a need in the art for a classification algorithm that results in a set of prediction rules useful for predicting the probable occurrence of a target event while overcoming the aforementioned drawbacks.