Efficient network management assumes having reliable information about the managed system. The only way to maintain such information at the management station is a continuous monitoring of the system parameters which affect management decisions. The increasing complexity of managed systems and services provided by them generates a need for monitoring of more and more parameters. If the managed system is a network, the same links are often used to transfer both the payload and the monitoring data. In this case, the volume of the monitoring data being transferred directly impacts performance of the managed system. Therefore minimizing the amount of monitoring related traffic in such networks is an important goal.
One can distinguish between two types of monitoring: statistical monitoring and reactive monitoring. In statistical monitoring, the management station derives some statistical properties, which are often used to predict some future trends, from the “raw” data. This basically means that all the “raw” data has to be transferred to the management station. In such a case, the potential for reducing the monitoring traffic is not large, since all data must arrive at the management station.
With reactive monitoring, the management station needs information about the network state in order to react (in real or semi-real time) to certain alarm conditions that may develop in the network. Such conditions usually indicate either a fault or some anomalous behavior which may cause a fault later on. In this case, there is a good chance of finding a mechanism which minimizes the amount of data transferred to the management station.
Two basic techniques are used for reactive network monitoring: polling and event reporting (see William Stallings, SNMP, SNMPv2, SNMPv3, RMON1 and 2, Adison Wesley, 1998). Polling is a process in which the management station sends requests to network elements in order to obtain the state information. Typically, polling is done periodically, with the fixed frequency determined by the time window within which the alarm condition has to be detected. Event reporting is a process where a local event in a network element triggers a report, that is sent by that element to the management station. In many practical network management applications, asynchronous traps can be defined on network elements so that event reporting can be used instead of explicit polling. This can be more efficient, since an event is generated only when the value of a state variable of a network element reaches a certain threshold. However, in many cases there is a need to monitor a global system parameter which is defined as a function of local properties of different network elements. In order to monitor such global parameters using event reporting, local traps have to be emitted continuously with the fixed frequency, which makes the event reporting as expensive as periodic polling.
Recently, a new theoretical framework for minimizing polling in the case of reactive monitoring was described in an article by Jia Jiao, Shamim Naqvi, Danny Raz, and Binay Sugla, entitled “Toward efficient monitoring”, IEEE Journal on Selected Areas in Communications, 18 (5):723-732, May 2000. The approach described by Jiao et al. is based on the fact that the evolution of state variables is usually restricted by some constraints. Taking those constraints into account allows the management station to predict the future state based on the past information and perform polling aperiodically, only when there is a possibility of an alarm condition. The framework in Jiao et al. deals only with polling. Accordingly, that technique is not able to realize the efficiency needed to successfully manage a real network with a large number of elements.