“Panta rhei”, said Heraklitos; everything is ‘in flux’. The truth of this famous aphorism by the ancient Greek philosopher can be said to be even more valid today. People are often confronted with the need to make decisions about financial, personal or inter-personal matters based on the observations of various factoring parameters. Therefore, since everything is in constant flow, monitoring the volatility/variability of important measurements over time becomes a critical determinant in any decision making process.
When dealing with time sequences, or time-series data, one important indicator of change is the presence of ‘burstiness’, which suggests that more events of importance are happening within the same time frame. Therefore, the identification of bursts can provide useful insights about an imminent change in the monitoring quantity, allowing a system analyst or individual to execute a timely and informed decision.
Monitoring and modeling of burst behavior is important in many areas. For example, in computer networks, it is generally recognized that network traffic can be bursty in various time-scales (e.g., “Why is the Internet traffic bursty in short (sub-RTT) time scales?” by H. Jiang et. al., in Proceedings of ACM SIGMETRICS, 2005; “On the self-similar nature of Ethernet traffic,” by W. E. Leland et. al., in Proceedings of ACM SIGCOMM, 1993.) The detection of bursts is therefore inherently important for identifying network bottlenecks or for intrusion detection, since an excessive amount of incoming packets may be a valid indication that a network system is under attack. Additionally, for applications such as fraud detection, it is critical to efficiently recognize any anomalous activity (typically in the form of over-utilization of resources). For example, burst detection techniques can be fruitfully utilized for spotting suspicious activities in large stock trading volumes or for the identification of fraudulent phone activity. Finally, in epidemiology and bio-terrorism, scientists are interested in the early detection of a disease outbreak. This may be indicated by the discovery of a sudden increase in the number of illnesses or visits to the doctor within a certain geographic area (e.g., “WSARE: What's strange about recent events?” by W.-K. Wong, et. al., in Journal of Urban Health 80, 2005; “Automated outbreak detection: a quantitative retrospective analysis,” by L. Stem, et. al., in Epidemiology and Infection 122, 1999).
Many recent works address the problem of burst detection (e.g., “Efficient elastic burst detection in data streams,” by Y. Zhu, et. al., in Proceedings of ACM SIGKDD, 2003; “Bursty and hierarchical structure in streams,” by J. Kleinberg in Proceedings of ACM SIGKDD, 2002). However, in many disciplines, more effective knowledge discovery can be achieved by identifying correlated bursts when monitoring multiple data sources. From a data-mining perspective, this task is more compelling and challenging, since it involves the identification of burst ‘clusters’ and it can also aid the discovery of causal chains of burst events, which possibly occur across multiple data streams.
Instances of burst correlation problems can be encountered in many financial and stock market applications, e.g., for triggering fraud alarms. For example, if there are correlated burst events among a phone call stream and other stock trading streams, alerts might be raised for further investigations for potential insider trading activities. Burst correlations can also be used to diagnose system performance problems in a complex computer system with many resources, such as multiple CPUs, disks, communication links and routers. In such a system, the utilization readings from individual resources represent the data streams. If one can find utilization readings from some resources which exhibit correlated burst events, then one can diagnose potential system problems and tune the system accordingly. Finally, burst correlation can be applicable for the discovery and measurement of gene co-expression (in that particular application, a burst is normally referred to with the term ‘up-regulation’), which holds substantial biological significance, since it can provide insight into functionally related groups of genes and proteins (e.g., “Exploring expression data: identification and analysis of co-expressed genes,” by L. J. Heyer, et. al., in Genome Research, 9:11, 1999).
In the publication “Identification of similarities, periodicities and bursts for online search queries,” by M. Vlachos, et. al., in Proceedings of ACM SIGMOD, 2004, bursts detected from multiple time series stored in a static database were represented as time intervals of their occurrences. However, those time series cannot be regarded as data streams and, as such, a much different environment than that contemplated herein is presented, in that there is no need to do incremental computation as typically needed in data stream applications.
In view of the foregoing, a need has been recognized in connection with providing an efficient and effective method for the correlation of burst events among two or more data streams.