A time series is a sequence of data indicating values over time. One example is the sequence of daily high temperatures in a city. Another example is the sequence of prices paid for a commodity over time.
One tool for examining time series is a window, which is a subsequence. A time window includes the values associated with a time period. A value window includes a specified number of values. The subsequence specified by a moving window changes over time for a moving time window and changes as values are received for a moving value window. For example, a five-minute moving window includes the values for the past five minutes.
Statistics over windows are useful to monitor time series. For example, the five-minute moving average of a time series is the average of values in the five-minute moving window. Since the values in moving windows change over time, window statistics also change over time. The straightforward method to re-compute a window statistic is to access all values in the window and compute the statistic directly. Online computation is another method. In online computation, a statistic value is computed by modifying the previous value of the statistic to account for values that expired from the window and values added to the window since the previous computation. For example, consider re-computing the sum of a hundred-value moving tick window when a new value is received. The straightforward method is to take the sum of the new value and the ninety-nine most recent previous values. The online method is to take the previous sum, subtract the oldest value used in the previous sum, and add the new value. The straightforward method requires about a hundred mathematical operations; the online method requires two.
When a moving window is inserted on a time series, there can be a time delay before the window becomes valid. For example, consider adding a five-value moving window to a time series. If the window statistics computation only uses values received after the window is formed, then the statistics are not valid until five new values are received. On the other hand, if historical values are available, then they can be used to compute the statistics. As a result, the window becomes valid earlier. For example, if there are values for a two-value moving window and a five-value moving window is inserted, then it is possible to have valid statistics after three new values.
One tool to monitor a set of time series is a persistent query. The persistent query contains an event condition and a payload specification. A system that executes a persistent query sends the specified payload as output if the event condition holds. The event condition may involve statistics over windows. For example, a persistent query could include the event condition:
five-day moving average temperature in Anaheim is more than 20 degrees higher than the ten-day moving average temperature in St. Louis
and the payload specification:
latest price for a flight from St. Louis to Anaheim.
There are many uses for a system to monitor time series data. In financial market trading, it is useful to monitor prices of multiple commodities or of the same commodity on different exchanges in order to trade when conditions indicate likely profit. In financial market-making, it is useful to monitor prices and volume in order to adjust bid-ask spreads in response to changes in volatility. For an electrical power provider, it is useful to monitor power usage and availability over time at different locales in order to produce and route power efficiently. Some desirable features of a time series monitoring system include the following.
Support high data throughput with low response time.
Support multiple input time series.
Execute persistent queries.
Support dynamic management of persistent queries, i.e., support insert and delete of persistent queries without halting input and persistent query execution.
Support dynamic management of windows.
Perform online computation of statistics.
Use historical values in present windows to help populate inserted windows.
Previous technologies have some of these features, but none has all. One previous technology with some of these features is a database. Another is online statistics software. Yet another is a system that combines online statistics software with a database.
A database can be configured to support multiple time series, execute persistent queries, and support dynamic management of persistent queries and windows, as follows. For each time series, use a database table to store each value in a record that also has a timestamp field which indicates when the value is received. For each persistent query, form a database trigger that executes a database query. The query encodes the condition and payload specification. Use database-supplied functions to compute statistics in the condition. For example, to compute the five-minute moving average of a time series, apply the database-supplied average function to the values with timestamps indicating receipt within the past five minutes. A shortcoming of using the database in this way is that the database-supplied functions do not perform online computation of statistics. Hence, response time suffers.
There is software that performs online computation of statistics. Some of this software supports high data throughput with low response time and executes persistent queries. However, this software is special-purpose; it does not support dynamic management of persistent queries, support dynamic management of windows, and use historical values in present windows to help populate inserted windows.
It is possible to build a system by combining online statistics software with a database. The statistics software receives time series values, computes statistics, and sends the statistics to the database. The database executes persistent queries. The system does not support dynamic management of persistent queries, support dynamic management of windows, and use historical values in present windows to help populate inserted windows. Even though the database alone supports these features, the system as a whole does not. The statistics software lacks these features, and both the database and the statistics software would need these features for the system as a whole to have them.