A data stream management system (DSMS) is a computing platform that processes continuous streams of events at relatively high data rates. These events, for instance, can be events received from sensors, events retained in a file, etc. Moreover, each event processed by a DSMS includes a payload and a time window corresponding thereto, wherein the time window indicates a window of time that the event is contributing to an output. A DSMS can receive multiple data streams, wherein each of the data streams includes various events, and the DSMS and can perform various processes on events in such streams to produce an output stream of events.
An event is a data object that is temporal in nature. Specifically, an event can include data indicative of a start time and an end time, wherein the window between the start time and the end time is when the event actively contributes to output. The start time and end time can be represented in an event by two timestamps, Vs and Vc. A data stream is a sequence of these events. In real-world applications, due to transmission delays or other factors, events can be generated and/or received out of order. A punctuation or current time increment (CTI) is a special event that is employed to limit an amount of disorder that can occur in a data stream. With more particularity, a CTI can guarantee that no subsequently received event in the data stream will have a start time prior to a time indicated by a CTI.
As indicated above, Vs and Vc can represent a start time and an end time for an event, such that the event is active for the time interval between Vs and Vc. A CTI is associated with an application time t, wherein the CTI is essentially a guarantee that no future event will have a Vs that is less than t. Moreover, events can be of different types. For instance, an insert event can be a new event from the outside world while a lifetime modification event (also referred to as a revision or a retraction or an expansion) is an event that signals a modification of the lifetime of an earlier event. A retraction sets the new end time Vnewe to a time earlier than Ve, while an expansion sets Vnewe to a time later than Ve. Thus, a physical stream can include an original event with a Vs and Vc along with a modification that changes Vc to Vnewe, which signals that the end time of the event has been changed. The special case of Vnewe=Vs signals a full retraction or deletion of the corresponding event. In general, a physical stream consists of inserts of new events as well as lifetime modifications of prior events, plus CTIs.
To process events provided to a DSMS, applications generally issue long running queries (referred to herein as continuous queries) that are registered with the DSMS. The DSMS accepts input data streams, and the query is executed over such input data streams. Processing of the input data streams results in production of an output data stream that denotes the result of the execution of the continuous query over the input data streams.
Generally, continuous queries are expressed declaratively and are typically compiled by the DSMS into a continuous query plan for execution. A continuous query plan can be represented as a directed acyclic graph, wherein nodes of the graph are operators, and wherein the operators are connected by event streams (which can also be referred to as event queues). During execution of the continuous query, incoming events are pushed through this directed acyclic graph of operators, and the final output stream is produced by way of such a particular continuous query plan. As events continue to arrive in the input streams, the continuous query plan continues to produce events in the output steam.
It is to be understood that oftentimes, a single continuous query can be represented by multiple different continuous query plans, wherein each of the continuous query plans can be logically equivalent. For instance, a continuous query can be declared that is configured to retrieve a temporal join of three input streams: A, B, and C. A first continuous query plan that represents such query may be ((AB)C), while a second continuous query plan that represents the query may be ((AC)B), while a third continuous query plan that represents the query may be ((BC)(A). In some instance, it may be desirable to migrate from a first continuous query plan to a second continuous query plan. This migration might be desirable, for instance, if the second continuous query plan will execute faster than the first continuous query plan, if high availability is desired with respect to the continuous query, etc.
Conventional techniques for migrating between continuous query plans, however, are relatively inefficient. For example, a first technique for migrating between query plans is to altogether stop the currently executing query plan (the old query plan) and transfer all state information and event queues from the old query plan to a new query plan. This is inefficient for a variety of reasons. First, the old query plan and the new query plan must be structurally equivalent query plans. Additionally, the old query plan must be stopped while all of the state information and event queues are copied from the old query plan to the new query plan. This copying can take a significant amount of time, and for queries that require high availability, this technique is an unacceptable solution.
Another technique for migrating from the old query plan to the new query plan involves continuing to execute the old query plan while providing the same input streams to the new query plan and executing both the old and new query plans until the result streams output by the query plans are identical. As indicated above, however, events processed by these query plans can have time windows corresponding thereto, these time windows can be relatively long. Thus, the new query plan will not output correct data until a relatively long time window has expired, which may be unacceptable in many situations. Furthermore, because of differences in the order and type of events between the two query plans, their respective outputs might never become completely identical.