A burst is generally defined as a window in time during which an event shows an usually high frequency of occurrence. For applications where data arrives as a time series or a stream, a burst indicates a period of time when something of interest took place. For example, in a pay-per-click revenue model, a burst in clicks to an online advertisement may indicate a click fraud. Over the Internet, network intrusions often exhibit a bursty traffic pattern; that is, the traffic from the attacked to the victim shows a sudden deluge, then does the damage and then fades away. In another field of endeavor, a Gamma ray burst might indicate an interesting phenomenon in astrophysics.
Detecting bursts in a time series or streaming data (e.g., Ethernet and Internet traffic, multimedia, disk traffic, log messages or the like) has drawn significant attention in the research community. In particular, recent attention has been focused on developing efficient algorithms for burst detection in the first instance.
Labeling a window in time as bursty requires the definition of at least two thresholds: a first threshold defining a particular number of events (k) and a second threshold defining a time span (t) of a window along which the events are viewed. A window is then defined as “(k,t)-bursty” if at least k events occur in a time interval of at most t. For a time window of a given length t, to know the number of events k that should be defined as “unusually high” requires a knowledge of how many events to typically expect in a window of length t. Similarly, for a given k, in order to define a certain t as “unusually short”, a knowledge of how long it typically takes for k events to occur needs to be known.
An article entitled “Efficient elastic burst detection in data streams” by Y. Zhu et al. appearing in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 336-345 (2003) considers the problem of “elastic” burst detection, where the authors propose the use of different event thresholds for different window sizes. In this case, a different event threshold is set for each unique window size, and identified windows over the time-series when an aggregate function (sum) computed over the window exceeds the corresponding threshold. The concept of elastic windows, i.e., windows of multiple sizes, was introduced since for most applications, the proper window size is not known a priori. The Zhu et al. approach utilizes a Shifted (binary) Wavelet Tree (SWT) data structure and builds on a time series of length n, with a complexity of O(n+j), where j in this case is the output size—the total number of windows where (true and false) bursts were reported.
While useful, it depends on some assumptions that may not always be true (such as windows of the ‘same size’ always utilize the same burst threshold k). The complexity also remains high for a large time span, which is a typical scenario in burst detection applications.
Thus, a need remains for an approach to determining critical threshold values for both k and t that are likely to define the existence of a burst along a data stream. Moreover, the knowledge of the critical values k and t alone will give further insight into the structure of the data and may help in the understanding of the behavior of the underlying system.