Network management programs require the ability to specify a "threshold" number of occurrences for different network and system failures so that the automation processing performed by the network management programs can make smarter decisions on which automated action to perform for different situations.
The threshold is a specified number of occurrences of an event within a specified time period which becomes the triggering mechanism for the performance of particular actions. For example, if the same network device had failed more than five times within a one hour time period, a different recovery procedure could be attempted as opposed to what would be done if the device had failed only once in that one hour time period. The threshold in this case is the occurrence of more than five failures within a one hour period.
Enabling an automation process in the network management program to determine whether a threshold has been reached regarding the frequency of an identical or similar system or network problem within a specified time period allows the network management program to make complex decisions on which automation action to perform. There are numerous scenarios in which system and network problems result in the issuance of messages or alerts that must be considered as a group rather than individually. For example, if a network device fails, and all devices connected to the device send an alert to the network management program, one indication of the failing device is all that is required for recovery; all other alerts can be ignored, thereby saving valuable system processing time. If the resulting burst of alerts from a major outage is not filtered based on a threshold, the burst can slow the network management processing to the point that resolution of, and recovery from, the outage problem takes a long period of time.
Providing a function in the network management program to determine a threshold level for system and network events (specifically, messages and alerts) allows the automation capability of the program to make an additional determination when deciding how a system or network problem can be solved automatically. Automation thresholding allows system and network automation to make more human-like decisions.
Event counting and thresholding is well known in the computing and networking arts. In U.S. Pat. No. 4,080,589, the occurrence of an error triggers a timer and begins a counting interval. Subsequent errors occurring during the interval are counted until a predetermined threshold is reached or the timing interval expires. An alarm is signalled and the timer is reset if the threshold is reached before expiration of the time interval. In U.S. Pat. No. 4,291,403, if the error count exceeds the predetermined threshold during an established time period, an alarm is generated and a second threshold is established to measure subsequent error rates. In U.S. Pat. No. 4,339,657, a variable time interval is established that is measured by the occurrence of a predetermined number of operations. The arrangement counts errors occurring during the operations and also counts the number of times that the error count crosses a predetermined threshold.
U.S. Pat. No. 5,223,827, having the same assignee as the present invention, improves on the other prior art by providing a mechanism for managing network event counters that enables the accumulation of information that can be manipulated to provide a variety of performance measurements. It makes use of an event counter, a sliding event threshold counter and a sliding interval counter for detection of an event threshold that requires performance of some type of action in response. Each time an event counter exceeds a threshold established by the sliding threshold event counter, the sliding threshold event counter is incremented by the contents of an offset, event value, and the sliding counter is updated to the sum of the offset time value and the present time. The disclosure of this patent is incorporated by reference herein.
While the known art is useful in many instances, the art does not provide a system or method for determining whether a specified threshold condition has been reached without the use of timers or counters, and without the need to recalculate time intervals for event occurrence. If a large number of threshold conditions has been specified, using prior art techniques, it becomes necessary to provide a multiplicity of timers or counters, and to recalculate time intervals for each of the threshold conditions.
The present invention improves on prior art techniques by eliminating the need for maintenance of multiple timers or counters and for recalculating time intervals in order to determine whether or not prescribed threshold conditions associated with a plurality of events associated with a plurality of devices (resources) in a communications network have been detected.