Many new requirements for streaming and event processing systems have been developed and used to design stream/event processing systems. These requirements derive from a multitude of motivating scenarios, some of which include sensor networks, large scale system administration, internet scale monitoring, and stock ticker data handling. Events from these streaming applications are frequently sent across unreliable networks resulting in the events frequently arriving at the associated stream processing system out-of-order.
Due to radically different performance and correctness requirements across different problem domains, systems have been vertically developed to handle a specific set of tradeoffs. These requirements include continuous queries (e.g., computing a one minute moving average for heat across a sensor network), insert/event rates that are very high (e.g., orders of magnitude higher than a traditional database can process inserts), and query capabilities for handling increasingly expressive standing queries (e.g., stateful computation such as join).
While streaming systems exist for specific vertical markets, broad adoption of a single system across a wide spectrum of application domains remains unattained. This is due in part to a need for domain-specific correct handling of out-of-order data and data retraction.
This requirement is exemplified by the following three scenarios. A corporate network of machines produces system maintenance events. As a result of transient network phenomena, such as network partitioning, individual events can get arbitrarily delayed. Since the consequence of an alert (e.g., finding machines that did not boot-up after a patch was installed) can require human intervention, problem install reporting should be delayed until the events get to the stream processing system. Another scenario involves the collecting of statistics on web traffic. Since networks are unreliable and there is far too much data to remember for any significant period of time, systems simply process the data as it comes in, dropping significantly late arriving data and reporting the best answer that can reasonably computed.
A final scenario involves the monitoring of stock activity for the purpose of computing trades. When the stock feed provides incorrect data there is service level agreement in place which gives the data provider a predetermined period of time (e.g., 72 hours) to report the correct ticker price for each reading. If a stock trade occurs using an incorrect price, the parties have the option to back out of the transaction during that period. Consequently, even though results are provided immediately, corrections may lead to some form of compensation. The system should respond instantly, but provide corrections when necessary.