The goal of condition monitoring is to observe processes and detect in advance potential failures or provide input for improved performance. To that effect, it is important to be able to find patterns indicative of an upcoming failure or a need for intervention. For example, sensor measurements showing sudden significant changes could indicate a failure or need for recalibration. Apart from sensor data many systems also generate a wealth of log messages. If the information from log files and sensors is combined, it may be learned that some sensors were replaced before the change started, and thus there is no reason for concern. Alternatively, it may be found that drastic changes, requiring a technician's visit, are always preceded by the same type of error messages in the logs. Currently, it is the analyst's job to look for such patterns by semi-manually processing trend data, log files, inventory records, etc.
Temporal data mining is aimed at exploiting temporal information in data sources to improve performance of clustering or classification algorithms or find models and patterns that describe the data generating process or local effects, respectively. Many data sources under study in business, health-care and scientific applications are dynamic in nature, making them promising candidates for the application of temporal mining methods. As used herein, the terms “temporal interval data” and “time interval data” are used to refer to data that contains time intervals or, in the case of semi-interval data, contain semi-intervals. Those terms in no way preclude the possibility that the data also contains time point data; i.e., mixed interval and point data is considered interval data for purposes of this discussion.
Deriving patterns over intervals is non trivial. The number of possible binary relations ranges from three relations for time points (before, equals, after) to the 13 interval relations described by James F. Allen, Maintaining Knowledge about Temporal Intervals, Communications of the ACM 26(11): 832-843 (1983) (hereinafter “Allen”), the contents of which is hereby incorporated by reference herein in its entirety. For semi-intervals, 10 core relations were identified by C. Freksa, Temporal Reasoning Based on Semi-Intervals, Artificial Intelligence 54(1): 199-227 (1992) (hereinafter “Freksa”). By adding interval-to-interval mid-point relations, 49 relations are obtained by J. F. Roddick and C. H. Mooney, Linear Temporal Sequences and Their Interpretation Using Midpoint Relationships, IEEE Transactions on Knowledge and Data Engineering 17(1): 133-135 (2005). The contents of each are hereby incorporated by reference herein in their entirety. Pattern mining in interval data has relied almost exclusively on Allen's interval relations.
For the purpose of temporal reasoning, Allen formalized temporal logic on intervals by specifying 13 interval relations and showing their completeness. Any two intervals are related by exactly one of the relations. Those operators are: before, meets, overlaps, starts, during, finishes, the corresponding inverses after, met by, overlapped by, started by, contains, finished by, and equals. The time diagram of FIG. 1a shows examples of Allen's interval relations between the intervals A and B. The first six illustrated relationships can be inverted.
The Allen relations are commonly used beyond temporal reasoning, e.g., for the formulation of temporal patterns, but that can be problematic, in particular for noisy data where the exact interval boundaries are not reliable or meaningful. The relations are not robust to noise because small shifts of time points lead to different relations for similar situations observed. For example, FIG. 1b shows several possible patterns according to Allen that are actually fragments of the same approximate relation “almost equals.” Researchers have attempted to remedy this problem by using thresholds, by using fuzzy extensions for temporal reasoning, by using different pattern languages that group some of the relations, or by matching against sub-intervals of observed intervals.
The formation of complex patterns using the binary relations of Allen can be done in different ways. Certain early approaches that used nested combinations of binary relations were shown to be ambiguous. The format described by F. Höeppner, Discovery of Temporal Patterns—Learning Rules about the Qualitative Behaviour of Time Series, In Proc. of the 5th European Conf. on Principles of Data Mining and Knowledge Discovery (PKDD), pages 192-203 (Springer 2001) (hereinafter “Höeppner”) (hereby incorporated by reference herein in its entirely), which uses the
      k    ⁡          (              k        -        1            )        2pairwise relations of all intervals in a pattern, is concise and has been adopted by several recently proposed efficient algorithms. Equivalent patterns have been represented in recent work as a sequence of 2k interval boundaries and by extending nested binary relations with counter variables, annotating how many intervals of a subpattern interact with an interval joined with a binary relation in different ways. The ambiguity inherent to Allen's relations, however, remains. Interval endpoints are typically allowed to shift within a pattern occurrence significantly without changing the relations, causing many situations in the data that are quite different to be represented with the same pattern.
FIG. 1c shows a timeline wherein three examples of the “overlaps” relation of Allen visually and intuitively represent very different situations. Early algorithms for mining patterns based on Allen's relations were based on the a priori principle of building longer patterns by combining frequent short ones. In one example, the transitivity of the relations was used to reduce the number of candidates generated. More recent algorithms use depth-first search strategies with efficient data structures such as enumeration trees, prefix trees and bitmaps.
Approaches that do not use Allen's relations for interval mining include containment patterns, the UTG with sequence of blocks of almost equal intervals, and the Time Series Knowledge Representation (TSKR) with partial orders of blocks of concurrent subintervals described in F. Moerchen and A. Ultsch, Efficient Mining of Understandable Patterns from Multivariate Interval Time Series, Data Min. Knowl. Discov. (2007) (hereinafter “Moerchen and Ultsch”).
All the above use qualitative interval patterns; quantitative interval patterns have also been proposed. Algorithms for time interval mining have been inspired by methods for mining time point data. It has been proposed to mine closed partial orders without repeating symbols using an itemset mining algorithm on the set of partial order graph edges. It has further been proposed to mine closed partial orders (including repeating symbols) from itemset sequences by grouping and merging sequential patterns.
It is important that any method or system for monitoring the condition of a system by monitoring temporal interval data, and for identifying important trends and events based on that data, do so in a way that is unambiguous and robust to noise that may cause interval boundaries or time points to shift. The technique should handle data types typically found in log files, trend data and inventory records; i.e., events associated with time points, time intervals and time semi-intervals.