A generally accepted goal for network service providers is five nines availability or 99.999% network up time. In order to achieve this goal service providers must minimize network outages. This requires proactive monitoring of network resources such that issues may be identified and addressed before they cause network outages. Often, proactive monitoring is provided by having the network management system (NMS) software periodically poll each network device in the network and retrieve data corresponding to particular, predetermined resource attributes. By monitoring and evaluating the resource attributes, certain errors may be avoided and network up time may be increased.
To prevent NMS polling from consuming too much network bandwidth, NMS polling is typically limited to, for example, every 15–20 minutes (i.e., polling interval). The longer the polling interval, the less burden NMS polling presents to the-network and each network device. Unfortunately, errors and issues may arise during the polling interval and be undetected by the NMS. Thus, the longer the polling interval, the higher the likelihood that undetected errors may lead to network, outages.
To address the polling interval gap, many network devices include some type of internal network device monitoring, which continuously samples certain resource attributes even during the polling interval. If the values' of resource attributes exceed or fall below expected levels (i.e., thresholds), a notice (e.g., an SNMP trap) is sent to the NMS. For example, an SNMP trap may be sent each time the number of packets received by a network device during a sample period exceeds a predetermined threshold value (i.e., a measure of utilization), and with this information, a network manager may choose to increase the capacity of the network or move a portion of the services handled by that network device to another network device. This is a simple threshold monitoring mechanism. Pushing data from the network devices to the NMS in accordance with thresholds provides increased scalability over NMS polling techniques. Unfortunately, scalability is still an issue since simple threshold monitoring may result in an excessive number of notices being sent to the NMS each time a resource attribute value exceeds or falls below a threshold.
Other network devices include a more advanced monitoring system that implements the Remote Monitoring (RMON) specification. RMON is a set of Simple Network Management Protocol (SNMP) based MIBs (Management Information Bases) that define groups of Ethernet and Token Ring diagnostics. Basically, through RMON, the resource attribute data gathered within the network device is evaluated against a fixed expression including both a rising and a falling threshold. That is, instead of sending a notice each time an attribute value is above or below a threshold, notices are only sent in accordance with the expression after both thresholds have been crossed. RMON is designed to suppress false alarms/limit the amount of reporting and, thus, increase scalability over the simple threshold mechanism.
The sampling frequency, threshold values and the resource attributes checked against those threshold values are often predetermined and fixed in software. Thus, to change a sampling frequency, a threshold value or check a different resource attribute against a threshold, requires a change to and re-release of the software. To provide some flexibility, many current systems allow users to input sampling frequencies and threshold values to override the predetermined values provided in software. A user may, therefore, customize sampling frequencies and threshold values without having to change or re-release the software. Still other systems monitor the predetermined resource attributes for a certain initial period of time and then automatically set the threshold values based on the data gathered during that initial period.
Within current systems, however, the resource attributes that may be checked against threshold values remain fixed in software and to add a different resource attribute requires a software modification and a re-release. In addition, the thresholds remain simple high and/or low values or a fixed expression as in RMON. Thus, network managers have no flexibility in determining when reports representing resource attribute values are made.