With prevalent application of computer technology to transactions and business operations, a very large amount of operational event data is being generated and stored in databases. In many applications, the stored event data includes a time-stamp that provides for the identification of the time of occurrence of the event. While a very large amount of sequential data is stored in databases, generally only a limited amount is analyzed due to the high cost in mining the data. As such, many patterns of interesting events remain hidden.
It is generally desirable to identify patterns of data events that define relationships between two or more data events. One such method for identifying patterns of data events within sequences is data mining. Data mining generally is the extraction of knowledge or patterns from data in databases or other information repositories. In contrast to simple database searches, data mining finds each sequence with at least one pattern satisfying the constraint.
A large volume of event data has been collected in applications such as maintenance records and web click applications. For example, a sequence of maintenance events may include a list of operational or failure events ordered by time of occurrence. Each event data item may include an identification of the particular event, the type or categorization of the event, and the time of occurrence. While events may be ordered in time, their contents have no ordering and are not easily compared to identify a trend or pattern.
One example of maintenance events stored in a temporal sequence is operational events associated with an aircraft. Each sequence may be associated with a particular aircraft and two or more sequences may be associated with a fleet of aircraft. Such aircraft maintenance event sequences are different than a simple time series. Maintenance events occur irregularly. Often, some time periods do not contain an event and other time periods containing two or more “overlapping” events.
For sequences containing temporal maintenance events, data mining methods and systems are generally designed to identify events that precede a hardware failure or maintenance event. The identified events usually are within a monitoring time range at which intervention may be possible to prevent a failure or to reduce a cost associated with the next event. Such patterns of events are generally subject to ordering and temporal constraints. Pattern occurrence is a fundamental concept in the sequential pattern data mining problems. As such, data mining methods may identify a pattern that forecasts a target event such as a failure or operational event.
A data mining task may include discovering sequential patterns among events, i.e., co-occurrences of multiple events and some ordinal or temporal relationship among them. The discovered patterns may be then interpreted as rules. An example pattern is an Engine Oil event followed by an Automatic Flight event in one to three days. This pattern can be interpreted as a sequential rule such as: If an Engine Oil event occurs within one to three days, an Automatic Flight event will occur.
As such, sequential pattern mining methods generally utilize pattern occurrence identification techniques. Pattern occurrence identification methods have been developed for sequential pattern discovery, filtering and ranking. In these methods, generally the recurrence of a pattern within the same sequence is ignored. Additionally, frequent pattern mining generally considers multiple sequences and often ignores pattern recurrences within a single sequence. However, the number of pattern occurrences in a single sequence can provide valuable insight especially for applications having long sequences each containing many events such as maintenance record data. For example, airplane maintenance records are usually kept for the life time of each airplane, and many patterns naturally repeat in the maintenance history of the same airplane. The number of occurrences within sequences might indicate problems of a particular airplane (while the number of occurrences across sequences might indicate problems of a group of airplanes).
One data mining method is a constraint-based mining method that utilizes user defined constraints defining the pattern to be mined and includes classification and association constraints. Another method includes distance-based association rules such as the density or number of events in an interval and/or the closeness of events in the interval.
Another method provides for the discovery of sequences of maximum length with support above a given threshold. A sequence is defined as an ordered list of elements where an element is defined as a set of items appearing together in a transaction. This method identified two data mining metrics, support and confidence. Support is defined as the extent to which the data is either positively or negatively relevant to the rule. Confidence is defined as the extent to which, within those that are relevant, the proposal is upheld.
Another method uses a sliding window on the input sequence to obtain a set of overlapping subsequences, and reports the number of subsequences in which the pattern occurs. Recurrences within a subsequence are ignored. Different numbers of occurrences for a pattern are a function of the selected window size. When the window size is large enough, all legitimate occurrences are considered. However, the same event instances or event pattern occurrences may be counted multiple times in multiple sliding windows even though there are only two instances of a particular event. The number of pattern occurrences increases as the window size increases. However, this method is limited as the choice of window size is critical. In addition, the sliding window approach is static and not very robust. For example, increasing the window size introduces a different number of new occurrences for different patterns, and thus changes the order of patterns in terms of the pattern occurrence or other derived measures.
In another method, only the minimal pattern occurrences are counted. In such a method, an occurrence is identified as minimum if no other occurrence can be found in any proper sub-interval of its time span. Legitimate occurrences of the pattern that are not “minimal” are ignored. However, a more constrained pattern may have more minimal occurrences. As such, such a method produces an unexpected result due, in part, to the exclusion of some legitimate occurrences.
Another data mining method includes the identification of an interesting pattern where events of an episode occur close in time. An episode is a conjunction of events bound to a given variable and that satisfies unary and binary predicates declared for those variables, e.g., a collection of events occurring frequently together or partially ordered collection of events occurring together. The method distinguishes between serial and parallel episodes and between simple and non-simple episodes, where a simple episode contains only unary predicates and no binary predicates. In this method, a time window is a user defined width of time defining how close the events must occur to each other within the episode. A window is a slice of an event sequence. An event sequence is a sequence of partially overlapping windows. The user may also specify how many windows an episode has to occur to be considered a frequent episode. Episodes that occur frequently within a sequence are determined.
In yet another method, a number of disjoint occurrences is determined. This method addresses discreet events and their relationship to each other, but does not allow for time overlapping events within the sequence. As such, this method is not applicable to patterns and sequences containing maintenance events or web transactions that inherently have time overlapped events.
Each of these methods is limited in their application and effectiveness in determining or identifying a pattern within a sequence of time-stamped events or categories. Therefore, the inventor of the present method and system believes it would be desirable for a method and system to effectively and efficiently provide for the identification of a pattern in a sequence of time-stamped events. The inventor also believes that it would be desirable for a method to provide for the identification of surprising patterns within one or more temporal sequences.