In traditional databases and data management systems, data is stored in an essentially static form within one or more computer memories. That is, the data may generally be altered when desired, but at any given moment the stored data represents a discrete, static, finite, persistent data set against which, e.g., queries may be issued.
In many settings, however, data may not be effectively or usefully managed in this way. In particular, it may occur that data arrives essentially continuously, as a stream of data points corresponding, e.g., to real-world events. Data stream management systems (DSMS) have been developed to make use of such data.
For example, data representing events within a manufacturing facility may fluctuate over the course of a day and/or over the lifetime of equipment within the facility. Such data may provide insight into an operational status of the facility, in order to optimize such operations. Additional/alternative examples of such data streams include temperature or other environmental data collected by sensors, computer network analytics, patient health data, or data describing business process(es).
During runtime, pre-stored queries may be applied against the data as the data arrives. For example, a portion of the data, generally referred to as a window of data, may be temporarily stored in main memory, and the queries are applied against the stored data portion before the stored data portion is deleted from storage. The stored data at a given point in time thus represents a state of the query at that time, where it may be appreciated that such state information is volatile, and changes as new data arrives. However, if one of the queries must be modified, then conventional systems must generally restart the query in question. Consequently, the stored data portion, i.e., the state of the query, is then unavailable or erased, so that new data must be collected before the new queries may be applied. This may result in a harmful delay and/or related difficulties for the user of the DSMS.
Therefore, some implementations of a DSMS attempt to migrate queries while maintaining relevant state information. In such migrations, the DSMS in question may continue processing an existing query, while migrating state data to a new query and beginning to process the new query as soon as possible. That is, both the original query and the new query may collect and process newly-arriving data, while existing state data is simultaneously transferred from the original query to the new query. A goal of such migrations is to complete the migration quickly, while maintaining output of query results during the transition. However, in such systems, a query to be migrated may have multiple operators, each with its own, arbitrary window size and associated state information. Moreover, the resulting new query may have different window sizes than the original window sizes, and therefore may require different amounts of state data than may have been required by corresponding query operators of the original query. For these and other reasons, therefore, it may be difficult to estimate a migration time and related variables that will be experienced by a DSMS which migrates state data from an existing, multi-state query to a new query.