There are many sources of data, including for example, weather data, temperature data, network traffic data, and automobile traffic data. Analyzing this data in real time can provide valuable insight to various situations, including but not limited to the ability to predict and prevent failures, choose alternatives, and enhance user experiences. Due to the ever increasing volume of data that is available for analysis, and the desire to deliver faster data processing for real-time applications, continuous data analysis is pushing the limits of traditional data warehousing technologies. Data Stream Management Systems (DSMS) provide a paradigm shift from the load-first analyze-later mode of data warehousing by processing more efficiently than disk based data processing systems.
Current generation DSMS lacks the functionality offered by the structured query language (SQL) and Database Management Systems (DBMS). That is, an SQL query is definable only on bounded and finite data, but streaming data is unbounded and infinite. But because a stream query is defined on unbounded data, and in general is limited to non-transactional event processing, the current generation DSMS is typically constructed independent of the database engine. Separating the DSMS and query engine platforms result in higher overhead for accessing and moving data. Managing data-intensive stream processing outside of the query engine causes fails to leverage the full SQL and DBMS functionality.
While some analytical systems purport to offer a “continued query” mode, these systems are based on automatic view updates and therefore not really supporting continuous querying. Other systems leverage database technology, but are characterized by providing a workflow-like service for launching a one-time SQL query to buffered data stream sets iteratively in a non-dataflow fashion.