Resource monitoring systems refer to systems that monitor other systems for situations that may require corrective action. A resource monitoring system typically includes a collection of rules that determines under what situations corrective action should be initiated and the type of corrective action to apply in a particular situation. When a resource monitoring system detects such a situation, the resource monitoring system may, for example, alert an operator and/or self-initiate corrective procedures. Resource monitoring systems are used to monitor a wide variety of software and hardware systems such as computers, applications programs, servers, and industrial systems and equipment, and may greatly expand the number of different systems and applications that an operator may be able to effectively manage.
Typically, resource monitoring systems operate at least in part by (1) extracting raw statistics (data) at specified time intervals from the application or system that is being monitored, (2) processing those statistics, and (3) alerting operators and/or taking corrective action when the processing logic determines a condition requiring the operator's attention and/or automatic correction (an “alert condition”) has occurred. Resource monitoring systems may generally be classified into one of two types, namely, instantaneous systems and persistent systems.
Instantaneous resource monitoring systems are systems that use current statistics (and, in some instances, the last prior measured statistic and the elapsed time between the current and last prior measurements) to determine whether an alert condition has occurred. In contrast, persistent systems keep track of the past k values of the measured statistics to impose a “situational persistence” requirement—i.e., a situation must occur for at least a certain amount of time before an alert is raised and/or before corrective action is taken. For example, in a persistent resource monitoring system an alert condition may only be deemed to have occurred if an instantaneous alert condition persists for at least three consecutive time intervals. More sophisticated persistent systems may look at both how many times a condition occurs and how many times it does not occur (such non-occurrences are referred to as “holes”) during the past k responses, meaning that an alert is only raised when the condition persists for a certain number of occurrences with no more than another number of non-occurrences during the specified interval. Thus, for example, a persistent resource monitoring system may specify that an alert condition is only deemed to have occurred if the instantaneous alert condition persists for at least ten time intervals with no more than 2 non-occurrences or “holes” occurring during that time interval. Alternatively, a persistent resource monitoring system may specify that an alert condition is only deemed to have occurred if the instantaneous alert condition appears ten times, with no more than 2 non-occurrences or “holes” between each alert condition.
Unfortunately, resource monitoring systems typically are very labor-intensive to construct and test. The difficulties associated with generating such models serves to severely limit their application, in terms of both resource monitoring systems that are developed and provided by outside vendors and ad hoc end-user constructed resource monitoring systems that are designed to detect specific alertable conditions.