1. Field
Certain embodiments relate to extraction of time-series behavior.
2. Description of the Related Art
Time-series prediction and time-series-trend recognition are problems of significant practical importance having a wide variety of applications spanning many fields, including, for example, signal processing, social science analysis, geology, astrophysics, weather forecasting, stock market analysis and workload projections.
Conventional approaches to time-series analysis include the use of autoregressive integrated moving average (ARIMA) models, Gabor polynomials and neural networks. Each conventional approach has serious disadvantages that render it an undesirable method of time-series analysis for certain applications. For example, ARIMA is a complex, sophisticated technique that is time-consuming, computationally intensive and requires a relatively large amount of training data to perform well. Moreover, ARIMA relies on autoregression that is not useful when the relationship between data points is weak.
By way of example, in the context of a relational database management system (RDBMS), it is often desirable to monitor a number of different indicators of the usage of the RDBMS. One such indicator might be central processing unit (CPU) utilization. For instance, a time series representing the hourly CPU-utilization over a week can provide valuable information on how a database application was used within the week, such as which day of the week featured heaviest use of the application.
Conventional tools for time-series analysis, however, do not perform well for many RDBMS-based time series such as CPU utilization. This poor performance stems in part from the fact that typically the data in an RDBMS-based time series, such as CPU utilization, exhibits only a weak relationship, if any, between the values of consecutive data points. As an example, the fact that CPU utilization is high during a given hour typically does not imply anything about CPU utilization for the next hour or the hour after that. Instead, time-related values such as the time of day are often more important for RDBMS-based time series because variability in human behavior plays a significant role in driving the patterns and trends in the monitored data. For instance, for an RDBMS used in a stock brokerage system, there may typically be daily peaks in CPU utilization around, say, 1:30 P.M. and 3:30 P.M., corresponding, respectively, to a high number of users logging into the system after lunch and to a high number of transactions performed before the stock market closes for the day at 4:00 P.M. Similarly, there may typically be a period of low CPU utilization from about 12:15 P.M. to 1:15 P.M., corresponding to system users taking their lunch breaks. These peaks and valleys are likely to occur regardless of how much the CPU is used through the rest of the day—e.g., regardless of whether trading throughout the rest of the day is frenzied or slow. Such behavior typically has much less to do with CPU utilization in the preceding hours than it does with human behavior caused by the time of day. As a result, conventional approaches to time-series analysis like ARIMA, which do not take into account time-related values like time of day, will not extract such daily behavior well. This is particularly true when there is little data available for training or the behavior extraction needs to be performed in a relatively short time. Even when consecutive data points exhibit a stronger relationship, conventional tools like ARIMA and neural networks are still relatively slow, computationally intensive methods of analysis.