Monitoring a data stream of observations from a process and seeking to use the observations to manage the process is useful in many application domains. For example, data center management, manufacturing process control, engineering process control, inventory management, and others.
Typically, monitoring an infrastructure, understanding the observations made and deciding how to adapt it or reconfigure it accordingly, can be a lengthy and costly process. Navigating through log files and traces in order to carry out root cause analysis of a failure can be long and tedious. Even when administrators use tooling, a lot of the times one has to alternate manually between the monitoring domain and the mining domain in order to transfer the data collected in logs and traces, run them through some mining algorithms, decide how to interpret that mining and react to the results observed. It is also typically the case that management actions are made based on past monitored observations, rather than the actual real-time state of the system.
The advent of the Digital Age has made large-scale data acquisition and online processing a crucial component of modern systems. A Data Stream Management System (DSMS) enables applications to issue long-running continuous queries that efficiently monitor and process streams of data in real times. DSMS are used for data processing in a broad range of applications, e.g. algorithmic stock trading.
Some previous approaches have sought to perform the processes of monitoring, managing and offline mining (in this order). Data is monitored to detect interesting patterns that are used to manage and perform business actions. The raw data is aggregated and stored offline. The historical data is then mined to determine new patterns (or modifications to existing patterns) that are fed back to the monitoring phase. Historical, offline data mining can reveal new chart patterns or refinements of existing patterns. This approach involves some manual steps and is therefore slow. In a world where corporations want faster insight into their data, this manual approach is not enough.
The embodiments described below are not limited to implementations which solve any or all of the disadvantages of known systems which monitor a data stream of observations from a process.