1. Technical Field
This invention relates to an autonomic computer system configured to monitor system resource and system configuration information. More specifically, common base events are generated, and based upon system configuration, are employed to monitor system resources and to resolve system configuration conflicts.
2. Description of the Prior Art
Within the past two decades, the development of raw computing power coupled with the proliferation of computer devices has grown at exponential rates. This phenomenal growth, along with the advent of the Internet, has led to a new age of accessibility to other people, other systems, and to information.
The simultaneous explosion of information and integration of technology into everyday life has brought on new demands for how people manage and maintain computer systems. The demand for information technology professionals is already outpacing supply when it comes to finding support for someone to manage complex, and even simple computer systems. As access to information becomes omnipresent through personal computers, hand-held devices, and wireless devices, the stability of current infrastructure, systems, and data is at an increasingly greater risk. This increasing complexity, in conjunction with a shortage of skilled information technology professionals, points towards an inevitable need to automate many of the functions associated with computing today.
Autonomic computing is one proposal to solve this technological challenge. Autonomic computing is a concept of building a computer system that regulates itself much in the same way that a person's autonomic nervous system regulates and protects the person's body. In autonomic computing, the system is self healing, self configured, self protected, and self managed. An autonomic computing environment functions with a high level of artificial intelligence while remaining invisible to the users. The autonomic computing environment operates organically in response to the input it collects.
Among the tools employed in an autonomic computing environment to support the self management is a common base event. In today's complex world of e-business, multitudes of interconnected systems must work together to perform many of the simple housekeeping activities which are necessary to keep a computing system healthy. A small event in a computing system can change things far beyond the seeming initial circumstance. An event, which encapsulates message data sent as the result of an occurrence, or situation, represents the very foundation on which these complex systems communicate. Basic aspects of enterprise management, such as performance monitoring, security and reliability, as well as fundamental portions of e-business communications, such as order tracking, are grounded in the viability and fidelity of these events, in that quality data lends to accurate, deterministic and proper management of the enterprise. Effort to ensure the accuracy, improve the detail and standardize the format of these fundamental enterprise building blocks is an imperative towards designing robust, manageable and deterministic systems. Events exchanged between and among applications in complex information technology systems represent the very nervous system that allows these various facets of the system to interoperate, communicate and coordinate their activities. We therefore define here the Common Base Event (CBE) as a new standard for events amongst management and business enterprise applications. The purpose of the CBE is to facilitate the effective communication among disparate enterprise components that support logging, management, problem determination, autonomic computer, and e-business functions in an enterprise.
The CBE definition ensures completeness of data by providing properties to publish the identification of the component that is reporting the situation, the identification of the component that is affected by the situation, and the situation itself. All properties defined in the CBE model apply to one of these three broad categories. In addition, the location of the reporter and source components is also considered. The affected component might not reside in the same physical machine as the component that reports it. This broader scope of information encapsulates enough data so that events can be exchanged and interpreted in a deterministic and appropriate manner across multiple management systems that consume the events without losing fidelity due to serial hops among the multiple management systems.
It is known in the art, that the CBE functions in conjunction with other tools. A log and trace analyzer (LTA) is one of those tools. The LTA enables viewing, analysis, and correlation of log files generated by different products in the system. The LTA acts as an autonomic manager when configured to receive CBEs. It performs the monitoring and analysis parts of the control loop. A managed resource passes CBEs to the LTA, allowing the autonomic manager to monitor, analyze, and correlate this data.
FIG. 1 is a prior art block diagram (100) of the hardware components associated with autonomic computing and CBEs. The hardware components shown herein include a server (102), a database (104), and storage media (106). Each of the hardware components are in communication with a generic log adaptor (GLA) (108). The GLA (108) converts text based logs to the CBE format for use with autonomic computing tools. More specifically, the GLA (108) prepares logs for use by the log and trace analyzer (LTA) (110). The LTA (110) allows import of log files from multiple products, as well as determines the relationship between the events captured by these products. The LTA (110) is shown in communication with a symptom database (112). The symptom database (112) is a file of symptoms, string match patterns, associated solutions, and directives. The database (112) is used in the analysis of event and error messages that may occur in a log. In one embodiment, the database (112) records incident and problem indications that could arise in the operation of the software components in the system. For every symptom, the symptom database (112) also contains the likely cause of the problem and a recommended solution for the problem. In one embodiment, a symptom is an error or event message. It may have a solution associated with it in the symptom database. A solution is information about why an error or an event may have occurred and how to recover from it. Log records can be analyzed using a symptom database to interpret known events and error conditions, and to get detailed information on error resolution.
FIG. 2 is a prior art flow diagram (200) illustrating a flow of events for determining problem solutions in an autonomic computing system. Stores of cases of failures that have no relation to defects are shown at (202). In one embodiment, the store of cases (202) includes recommendations or solutions based upon prior experience to provide stable operation of the system. The store of cases (202) is placed in a database (204). The database (204) is in communication with a checklist (206) which functions as a guideline. In response to occurrence of a system failure (208), an information technology support team will review the failure in view of the checklist (206) and the database (204) to provide an effective consultation for system stabilization (210). Over the course of time and based upon future system failures, the database (204) and associated checklist (206) will be updated and grow. The growth of these items will facilitate solution consultation (210) to resolve failures in the system.
Prior art autonomic computer systems are limited to resolving errors associated with software components in the system. However, the prior art does not address application of an autonomic computer system to hardware configuration of the components therein. Failures in a system that affect operability thereof are not limited to software components. Accordingly, there is a need for a solution that analyzes the risk of a failure in the system with respect to configuration of a hardware component prior to the occurrence thereof and to resolve the potential failure prior to a happening of the failure.
In addition, recent developments in the art have developed tools to investigate software configuration parameters as a standard form of practice and un-related to a failure. However, the prior art tools focus on tuning parameters from a product perspective and do not provide advice pertaining to the risk to the entire system. In other words, the most recent developments are on a product-by-product basis for software products operating in a computer system, but do not resolve the issues of one or more products in the system and how the product risk will affect the entire system. Accordingly, there is a need to analyze system stability in it's entirety.