A. Field of Invention.
The present invention is in the field of software-implemented systems and articles of manufacture to monitor and determine or predict the state of networks, computers, software systems, logical networks or other components of an information system.
B. Related Background Art
Prior art systems and network management systems monitor the state of system components by monitoring various metrics and comparing them with predefined threshold values. Data samples are typically gathered by monitoring agents, by probes inserted in software systems and by other management tools. The samples may be gathered on a regular basis, by intermittent polling over the network, or on an event basis, triggered by signals sent by an agent, probe or manager. In practice, an agent may monitor a thousand metrics for a single computer system. A state manager may monitor several logical systems operating within several networked computer systems and network components. A state manager typically determines the state of a monitored system by comparing the metrics with predefined threshold values. The determination may be based on logical combinations of several such comparisons. The state may be determined in response to events sent by an agent. The determination may be based on correlating events and conditions across several system components or over time. The aggregate of the rules that define how the information is collected and the state determination is made is generally referred to as a "policy". Such state managers are referred to as "policy-based state managers."
Such systems have the theoretical potential to work but they have distinct practical disadvantages and limitations. Deciding on a set of meaningful threshold values for hundreds or thousands of metrics is complex. Deciding on specific threshold values for different types of systems is even more difficult. Computer systems come in many different hardware and software configurations and have many different usage profiles. For example, a database server may have a very different load profile from a web server and their respective suitable threshold values for raising alarms may be very different. Selecting logical conditions for correlating several metrics, perhaps correlating over time, further increases the decisional complexity. defining a policy that gives early warning of impending problems, rather than giving redundant information that a system is already down, is also very difficult.
Thus, policy-based state managers in the prior art may work well if properly configured. However, when used in complex, networked systems, they are very difficult to configure so as to give meaningful indications.