It is well known in the art to process queries over continuous streams of data using one or more computer(s) that may be called a data stream management system (DSMS). Such a system may also be called an event processing system (EPS) or a continuous query (CQ) system, although in the following description of the current patent application, the term “data stream management system” or its abbreviation “DSMS” is used. DSMS systems typically receive from a user a textual representation of a query (called “continuous query”) that is to be applied to a stream of data. Data in the stream changes over time, in contrast to relatively static data that is typically found stored in a database. Examples of data streams are: real time stock quotes, real time traffic monitoring on highways, and real time packet monitoring on a computer network such as the Internet.
FIG. 1A illustrates a prior art DSMS built at the Stanford University, in which data streams from network monitoring can be processed, to detect intrusions and generate online performance metrics, in response to queries (called “continuous queries”) on the data streams. Note that in such data stream management systems (DSMS), each stream can be infinitely long and the data can keep arriving indefinitely and hence the amount of data is too large to be persisted by a database management system (DBMS) into a database.
As shown in FIG. 1B a prior art DSMS may include a continuous query compiler that receives a continuous query and builds a physical plan which consists of a tree of natively supported operators. Any number of such physical plans (one plan per query) may be combined together, before DSMS starts normal operation, into a global plan that is to be executed. When the DSMS starts execution, the global plan is used by a query execution engine (also called “runtime engine”) to identify data from one or more incoming stream(s) that matches a query and based on such identified data the engine generates output data, in a streaming fashion.
As noted above, one such system was built at Stanford University, in a project called the Standford Stream Data Management (STREAM) Project which is described in an article entitled “STREAM: The Stanford Data Stream Management System” by Arvind Arasu, Brian Babcock, Shivnath Babu, John Cieslewicz, Mayur Datar, Keith Ito, Rajeev Motwani, Utkarsh Srivastava, and Jennifer Widom published on the Internet in 2004. The just-described article is incorporated by reference herein in its entirety as background.
For more information on other such systems, see the following articles each of which is incorporated by reference herein in its entirety as background:    [a] S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Ramna, F. Reiss, M. Shah, “TelegraphCQ: Continuous Dataflow Processing for an Uncertain World”, Proceedings of CIDR 2003;    [b] J. Chen, D. Dewitt, F. Tian, Y. Wang, “NiagaraCQ: A Scalable Continuous Query System for Internet Databases”, PROCEEDINGS OF 2000 ACM SIGMOD, p 379-390; and    [c] D. B. Terry, D. Goldberg, D. Nichols, B. Oki, “Continuous queries over append-only databases”, PROCEEDINGS OF 1992 ACM SIGMOD, pages 321-330.
Continuous queries (also called “persistent” queries) are typically registered in a data stream management system (DSMS) prior to its operation on data streams. The continuous queries are typically expressed in a declarative language that can be parsed by the DSMS. One such language called “continuous query language” or CQL has been developed at Stanford University primarily based on the database query language SQL, by adding support for real-time features, e.g. adding data stream S as a new data type based on a series of (possibly infinite) time-stamped tuples. Each tuple s belongs to a common schema for entire data stream S and the time t increases monotonically. Note that such a data stream can contain 0, 1 or more pairs each having the same (i.e. common) time stamp.
Stanford's CQL supports windows on streams (derived from SQL-99) based on another new data type called “relation”, defined as follows. A relation R is an unordered group of tuples at any time instant t which is denoted as R(t). The CQL relation differs from a relation of a standard relational database accessed using SQL, because traditional SQL's relation is simply a set (or bag) of tuples with no notion of time, whereas the CQL relation (or simply “relation”) is a time-varying group of tuples (e.g. the current number of vehicles in a given stretch of a particular highway). All stream-to-relation operators in Stanford's CQL are based on the concept of a sliding window over a stream: a window that at any point of time contains a historical snapshot of a finite portion of the stream. Syntactically, sliding window operators are specified in CQL using a window specification language, based on SQL-99.
For more information on Stanford University's CQL, see a paper by A. Arasu, S. Babu, and J. Widom entitled “The CQL Continuous Query Language: Semantic Foundation and Query Execution”, published as Technical Report 2003-67 by Stanford University, 2003 (also published in VLDB Journal, Volume 15, Issue 2, June 2006, at Pages 121-142). See also, another paper by A. Arasu, S. Babu, J. Widom, entitled “An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations” in 9th Intl Workshop on Database programming languages, pages 1-11, September 2003. The two papers described in this paragraph are incorporated by reference herein in their entirety as background.
An example to illustrate continuous queries is shown in FIGS. 1C-1E which are reproduced from the VLDB Journal paper described in the previous paragraph. Specifically, FIG. 1E illustrates a merged STREAM query plan for two continuous queries, Q1 and Q2 over input streams S1 and S2. Query Q1 of FIG. 1E is shown in detail in FIG. 1C expressed in CQL as a windowed-aggregate query: it maintains the maximum value of S1:A for each distinct value of S1:B over a 50,000-tuple sliding window on stream S1. Query Q2 shown in FIG. 1D is expressed in CQL and used to stream the result of a sliding-window join over streams S1 and S2. The window on S1 is a tuple-based window containing the last 40,000 tuples, while the window on S2 is a 10-minutes time-based window.
Several DSMS of prior art, such as Stanford University's DSMS treat relations that change infrequently similar to streams of data that change very frequently, which is insufficient to handle situations that may sometimes arise from failure of such a source to send data to the DSMS for a long time. For example, if an operator in the DSMS receives as its two inputs (1) a silent relation and (2) a stream, then data from the stream must be kept buffered at the operator, until receipt of the next incremental change from the source of the silent relation before the operator can perform its operation (e.g. a Join or Union). Note that the DSMS has no information on how the time changes in the silent relation's source, relative to a stream's source.
The just described problem may be overcome by requiring the silent relation's source to send a time stamp (with or without data), even when there has not been any change in data, and such a transmission is commonly referred to as a “heartbeat.” Receipt of a heartbeat from a source by the DSMS indicates that all data from this source will have a later time stamp than the heartbeat's time stamp. The heartbeat may or may not be periodic, but successive heartbeats must arrive in a monotonically non-decreasing time sequence at the DSMS. The DSMS handles heartbeats in the normal manner of handling any other data, except that there is no data associated with the heartbeat. Specifically, the relation value does not change, its value at the timestamp of the heartbeat is the same as its value on the last timestamp received from the operator (the one before the last heartbeat). However, use of heartbeats requires the source to send time stamps, which is problematic for sources that traditionally do not normally supply time stamps, such as a database.