1. Field of the Invention
The present invention generally relates to a system management system, and more particularly to preprocessing data for forecasting in capacity management applications, or in software rejuvenation applications, and to decomposing a signal used for prediction into different components, with the decomposition relying on the wavelet transform and on time-domain methods.
2. Description of the Related Art
Within a computer system, or a computer network, “capacity management” describes the process of maintaining resources above a minimum desired level, such as by adapting configurations or by purchasing new equipment (e.g., see N. G. Duffield et al., “A Flexible Model for Resource Management in Virtual Private Networks”; Proc. ACM SIGCOMM '99, 1999, pp. 95–108; I. M. Graf, “Transformation Between Different Levels of Workload Characterization for Capacity Planning: Fundamentals and Case Study”, Proc. ACM Conf. Measurement and Modeling of Computer Systems, 1987, pp. 195–204; and A. B. Pritsker et al. “Total Capacity Management Using Simulation”, Proc. 1991 Winter Simulation Conference, 1991, pp. 348–355.
For example, the system administrator of a Local Area Network (LAN) decides when new disks are needed because the existing ones are low on free space, when the network backbone should be upgraded, to support increased traffic, or when new servers are needed, to respond to new user requirements.
Traditionally, system administrators monitor quantities, such as network traffic, available space on disks, response times of servers, number of user processes running on different machines, etc. and make appropriate decisions when resources are low or close to being exhausted. It is important to make the right decision at the right time. Unnecessary upgrades of equipment are expensive and might not result in better productivity. At the same time, decisions to upgrade should be made before resources are completely utilized, since productivity is reduced until the upgrade is actually performed.
Hence, the role of prediction is becoming very important in capacity management. Available software packages, such as IBM Netfinity Capacity Manager module of the IBM Netfinity Service Manager package, now incorporate prediction capabilities. That is, using past data, the system projects resource utilization into the future, computes confidence intervals for the prediction, and provides a probabilistic estimate of when the resource(s) will be close to exhaustion.
Similar prediction problems can be found in the area of Software Rejuvenation (e.g., see S. Garg et al., “Minimizing Completion Time of a Program by Checkpointing and Rejuvenation”, Proc. ACM SIGMETRICS Conference on Measurement & Modeling of Computer Systems, 1996, pp. 252–261).
Software rejuvenation is a discipline concerned with scheduling the termination and re-initialization of applications or operating systems, in order to avoid catastrophic crashes. It is the mechanical equivalent of preventive maintenance. The assumption is that “bugs” in software might cause programs to allocate resources and never release them. Eventually, a crash is caused when the needed resource is exhausted. A typical example of this kind of “bug” is a memory leak, where an application allocates memory but mistakenly never releases it. Other resources that can be exhausted are semaphores, mutexes, handles etc.
The motivation for rejuvenating software rather than waiting for a crash is twofold. A first reason is to prevent data loss and secondly to guarantee quality of service. For example, a crash of a database usually requires a rollback operation to a consistent state (reliably stored on persistent media during periodic checkpoint operations), the reconstruction of all the memory-based data structures, and the re-execution of all the transactions that have been committed after the most recent checkpoint, and have been stored in an appropriate log. The time required to recover from a catastrophic crash can be in the order of hours to tens of hours, during which the database is not available to process new transactions. Since the re-initialization of the database to a checkpointed state is the least expensive of the above operations, it is beneficial to schedule the rejuvenation of the piece of software right after a checkpoint operation, possibly during a time of low utilization.
In its simplest form, rejuvenation is based on static scheduling, where the application or the operating system is periodically restarted. Adaptive approaches to software rejuvenation are also possible. For example, quantities related to resources that can be exhausted could be monitored, and their values and variations over time can be analyzed by a prediction algorithm. If a resource is in danger of being exhausted in the near future, then the system administrator is notified of the problem and decides whether a rejuvenation should be appropriately scheduled.
The combination of a static schedule and prediction techniques is also possible. Here, a static schedule is put in place. Then, using the prediction techniques, the software package can estimate the probability that a crash happens if a scheduled rejuvenation is not executed. If this probability is acceptably low, then the rejuvenation step is skipped.
Prediction of data is a well studied field. By analyzing the values of the quantity acquired over a period of time (a time series), prediction methods make an inference about the future values. Typical statistical analysis and forecasting of time series (e.g., see G. E. P. Box et al., “Time Series Analysis, Forecasting and Control”, Holden-Day Inc., San Francisco, Calif., 1970) assume that the data can be modeled as a realization of a stationary stochastic process. Usually, the data is modeled as the superposition of a deterministic component and a stochastic component.
The deterministic component is usually modeled as a linear or negative exponential function of time, while the stochastic component is usually modeled as white noise, as an autoregressive process (AR), a moving average process (MA), an autoregressive-moving average (ARMA) process, or a derived process (e.g., see M. S. Squillante et al., “Analysis of Job Arrival Patterns and Parallel Scheduling Performance”, Performance Evaluation 36–37(1–4): 137–163 (1999). Prediction is then performed by estimating the parameters of the noise model, fitting a curve to the deterministic data, and using the curve to extrapolate the behavior of the data to the future.
Often, confidence intervals for the prediction can be estimated from the data. A p % confidence interval for the value x(t) taken by the process x at time t is the set of values between two extremes xL(t), xH(t). The meaning of a p % confidence interval is the following. That is, if the process x satisfies the assumptions used to compute the confidence interval, then the probability that x(t) belongs to the interval [xL(t), xH(t)] is p/100. If p is close to 100, then the probability that x(t) lies outside the interval is very small.
Often, periodic components can be seen in the data. For example, in a corporate computer system, the utilization of the system tends to be more substantial in the morning than around lunch time or at night, while Internet Service Providers (ISPs) experience an increase of residential traffic in the evenings. Periodic components can be potentially substantial, and, if not taken into account, can significantly affect the prediction accuracy. Methods for estimating periodic and seasonal behavior are known in the art (e.g., see M. S. Squillante et al., “Internet Traffic: Periodicity, Tail Behavior and Performance Implications”. IBM Research Report No. RC21500, 1999).
Similar methods are used to describe time series in terms of uncorrelated stochastic processes. For example, the Wold decomposition (e.g., see T. W. Anderson, The Statistical Analysis of Time Series, John Wiley & Sons, Inc., 1971) (Chapter 7.6.3), has been used to represent time series as a sum of a MA process (of possibly infinite order) and an uncorrelated white process (e.g., see G. Nicolao et al., “On the Wold Decomposition of Discrete-time Cyclostationary Processes”, IEEE Trans. Signal Processing, 47(7), July, 1999 pp. 2041–2043). Wold-like decompositions have been extensively used in image representation, where images are modeled as 2-dimensional random fields, and represented as a sum of purely non-deterministic fields, several harmonic (i.e., 2-dimensional sinusoids) fields, and a countable number of evanescent fields (e.g., see J. M. Francos, “Bounds on the Accuracy of Estimating the Parameters of Discrete Homogeneous Random Fields with Mixed Spectral Distributions”, IEEE Trans. Information Theory, 43(3), May, 1997 pp. 908–922). A typical application of 2-dimensional Wold-like decomposition is image retrieval in digital libraries (e.g., see R. Stoica et al., “The Two-dimensional Wold Decomposition For Segmentation And Indexing In Image Libraries”, Proc. of the 1998 IEEE Int'l Conf Acoustics, Speech and Signal Processing, 1998, pp. 2977–2980).
Prediction in existing capacity management and rejuvenation systems is sensitive to the underlying assumptions described above. Data describing the overall behavior of large systems sometimes satisfies such assumptions. For example, the overall request traffic arriving at a very large web site (e.g., the Olympic web site) can be successfully analyzed using known techniques.
However, in most scenarios, the data does not satisfy the assumptions, and the resulting prediction can be either erroneous, or can produce very wide confidence intervals. Hence, the usefulness of the prediction is significantly reduced.
Additionally, often the data contains useful information that is not captured by the known types of decompositions. For example, changes over time of parameters of the different component processes, such as the increase of the variance of the non-deterministic component of a monitored quantity might be a powerful indicator of resource exhaustion, and would not be captured by the known decompositions, where the stationarity assumption implies that the non-deterministic component must have a fixed variance. Similarly, significant information is contained in certain characteristics of the monitored quantities, such as the presence, time behavior and amplitude of spikes or of jumps, which are hardly detectable from the known decompositions.