The present invention relates to computer systems, and more particularly, to methods and systems for the autonomic management of computer systems and/or components thereof.
Software and hardware systems such as computer servers and networks, and the applications running thereon, have grown increasingly complex with the use of distributed client/server applications, heterogeneous platforms and/or multiple protocols all on a single physical backbone. This increase in the complexity of these systems may make management of the systems more difficult. Such management may include detecting problems within the system, isolating these problems, and resolving the problem using messages and/or events generated by various components that participate in solving the problem.
Typically, various components of a system that is to be managed, including applications, middleware, hardware devices and the like, generate data that represents the status of the component. Such data may include, for example, CPU load data, concurrent user information, free disk space data, network availability information, etc. This component status data is typically consumed by a management function that monitors the system and/or performs problem analysis and resolution. The management function may, for example, be a user reading a log file or a management application that analyzes and/or displays the data.
Automatic computing management systems, which are also known as autonomic managers, refer to systems that automatically monitor other systems for situations that may require corrective action. Autonomic managers may also, in some instances, perform corrective action automatically. Autonomic managers typically include a collection of rules that determines under what situations corrective action should be initiated and the type of corrective action to apply in a particular situation. When an autonomic manager detects such a situation, the manager may, for example, alert an operator and/or self-initiate corrective procedures. Autonomic managers may be used to monitor a wide variety of software and hardware systems such as computers, applications programs, servers, and industrial systems and equipment, and may greatly expand the number of different systems and applications that an operator may be able to effectively manage. Herein, the software or hardware system that is being monitored and managed is referred to as the “managed system” or the “managed resource.”
Conventional autonomic managers typically use a “MAPE” loop strategy (Monitor, Analyze, Plan and Execute) to determine the circumstances under which various corrective actions will be initiated. Under the MAPE loop strategy, metrics associated with the managed resources are monitored, analysis is then performed regarding how the monitored metrics may be used to enact policies, planning is done to avoid policy conflicts, and then the policies may be executed to implement the autonomic manager. Typically, this loop requires complex computer code and/or infrastructure. In conventional autonomic managers, the chosen policies represent the key to the control of the resource that is to be managed. Currently, policy editors are typically used as condition constructors, which may allow system administrators to, for example, generate large “if-then” statement blocks that define the way that the autonomic manager operates. Generation of the appropriate policies can be quite difficult, however, as it is often difficult to translate day-to-day maintenance and operation of the managed system into a policy format.