1. Field of the Invention
The present invention generally relates to a method and system for software rejuvenation, and more particularly to a method and system for transparent symptom-based selective software rejuvenation.
2. Description of the Related Art
Software failures are now known to be a dominant source of system outages. One common form of software failure is due to xe2x80x9csoftware agingxe2x80x9d in which a resource, such as memory usage, is increasingly consumed and which eventually causes the system to fail. Preventing such aging by restarting the system (or subsystem) is known as xe2x80x9csoftware rejuvenation.xe2x80x9d
The background and related art pertaining to software rejuvenation is described in detail in the above-mentioned copending U.S. patent application Ser. Nos. 09/442,003 and No. 09/442,001.
The second of these applications (e.g., copending application Ser. No. 09/442,001) deals with prediction of resource exhaustion due to software aging effects, and teaches that resource exhaustion can be predicted using trend analysis on recently measured symptoms of the system being monitored. Specific trend analysis techniques used include linear regression, polynomial regression, and a modification of xe2x80x9cSen""s slope estimate.xe2x80x9d These methods attempt to predict when a resource, or set of resources, approach a state in which resource exhaustion is imminent, and a software rejuvenation should be scheduled. However, copending application Ser. No. 09/442,001 does not teach how to select which trending method to use.
Furthermore, the suggested trending methods may not always be effective. For example, while polynomial regression may adequately fit the data observed in the recent past, it is not always a good predictor since polynomial values extrapolated into the future are not necessarily monotone. Further, such estimates are often unstable.
Thus, prior to the present invention, there has been no method of scheduling rejuvenation times by predicting resource exhaustion times from the best predictor, selected from a multitude of possible types of predictors (models). Further, while it is noted that similar notions are used in classical statistics to select the best model from amongst a set of possible models (see, e.g., Chapter 6 of Applied Regression Analysis, Second Edition, Norman Draper and Harry Smith, John Wiley and Sons, Inc., 1981), prior to the present invention, such approaches have not been used to predict software resource exhaustion times and to avoid disruptive software failures by scheduling rejuvenation times.
Moreover, the preferred types of predictors to consider, how to set their parameters, and how to choose between different predictors very much depends upon the software rejuvenation context. Indeed, the details of selecting appropriate classes of models, and appropriate penalty functions is not straightforward. Hence, no such easy consideration (let alone recognition of the problem) has been undertaken prior to the present invention.
In view of the foregoing and other problems, drawbacks, and disadvantages of the conventional methods and structures, an object of the present invention is to provide a method and structure having a prediction module for a software rejuvenation agent operating in a computing environment.
In a first aspect of the present invention, a method (and computer system where at least one software component thereof is restarted based on projection of resource exhaustion), for selecting the most suitable projection method from among a class of projection methods, includes providing M fitting modules which take measured symptom data associated with the system as input and produce M scores, wherein M is an integer, selecting the fitting module producing the best score, and from the selected module, producing a prediction of the resource exhaustion time.
Thus, the inventive prediction module increases system availability by avoiding disruptive system crashes by scheduling software rejuvenations at times prior to estimated resource exhaustion, and avoiding unnecessary rejuvenations caused by poor prediction of resource exhaustion times.
In the invention, multiple fitting modules are run against a recently collected symptom time series data sets from the system being monitored (These are called xe2x80x9csymptom parametersxe2x80x9d in copending application Ser. No. 09/442,001). Examples of measured symptoms that can be monitored in the exemplary application include memory usage, number of processes, etc. Such symptoms depend on, for example, the operating system, the applications being run, etc. Obviously, as would be known by one of ordinary skill in the art taking the present application as a whole, other symptoms may be measured, depending on the operating system.
Multiple measured symptoms can also be combined into a single, aggregate measured symptom. Associated with each fitting module is a score (or penalty) that measures how effectively the fitting module fits (describes) the collected data. The fitting module having the best score is selected as being the most reliable module for describing the behavior of the measured symptom.
Associated with each fitting module is a prediction module that predicts when the system will exhaust resources associated with the measured symptoms. The prediction module corresponding to the fitting module with the best score is selected as the most reliable predictor of resource exhaustion for that symptom.
These predictions (e.g., one for each measured symptom) are input to the software rejuvenation agent which then schedules rejuvenations based on the predictions, as well as other considerations (e.g., there may be rules stating that two simultaneous rejuvenations are not permitted, or rules preferring that rejuvenations be scheduled during xe2x80x9cnon-prime shiftsxe2x80x9d, etc.).
Thus, the present invention improves upon previous approaches to scheduling rejuvenation times by predicting resource exhaustion times from the best predictor, selected from a multitude of possible types of predictors (models). Further, the invention optimizes and selects the preferred types of predictors to consider, how to set their parameters, and how to choose between different predictors very much depends upon the software rejuvenation context.