Operational business intelligence (“BI”) applications often derive critical information from continuously collected data through stream processing. Stream processing is characterized by processing data first and then optionally storing the results in a data sink such as a database. Dynamically collected data from a data stream and static data from a database may be used in combination. However, separate data stream management systems (“DSMS”) and database management systems (“DBMS”) typically are deployed to access information from these separate yet often related sources.
Although processing power has increased greatly in recent years, the increase in data bandwidth has been much less dramatic. For large enterprises, the amount of data that is transferred from a data stream to a data warehouse is becoming extremely large, creating a considerable bottleneck in the BI process. Moreover, when the data set required for analytics is large, a DSMS may be overly burdened with data management issues (e.g., data structure, layout, indexing, buffer management, storage) that are better handled by a DBMS. Many of these issues also may be handled at the application level, but this introduces security concerns, with potentially-sensitive data being cached in files of various BI applications.
Rather than gathering data directly from a data stream, some DSMS are connected to a database that is used to temporarily store captured stream data. This type of DSMS provides users with the mature data management capabilities of a DBMS. However, it also requires that data be written to disk first, which introduces significant overhead from disk reads and writes. Some systems support continuous queries for monitoring a change in persistent data using cursors and other similar features. However, this approach still requires that streamed data be stored first and processed later.
Other DSMS are provided with custom-built data management facilities to deal with data in a data stream more directly. These DSMS may operate more efficiently than DSMS that are connected to a data sink. However, the custom data management capabilities employed by these DSMS typically are built from scratch and are not necessarily compatible with other data systems. Moreover, they fail to take advantage of the mature data management capabilities of a DBMS.
One approach that attempts to address these shortcomings is to build a DSMS on top of a DBMS so that the DBMS includes stream processing capabilities. A database query is executed a number of times on “chunks” of stream elements. A problem with this approach is that the frequent set-up and tear-down of database queries introduces significant computational overhead, and cannot meet the efficiency requirements of particularly data-intensive BI applications. Moreover, this approach often requires the use of a centralized scheduler to control the frequency at which a query is executed.