Nowadays, as information systems become ubiquitous and companies and organizations of all sectors become more and more dependent on their computing resources, the requirement for the availability of the components of an information technological (IT) network is increasing while the complexity of IT networks is growing. An IT network often comprises a diversity of devices, such as bridges, switches, routers, workstations, servers, etc., and connections between them. Typically, IT networks are not static. Rather, existing devices or connections are often removed, new devices and new connections between devices are added, so that the network topology changes dynamically. Consequently, monitoring and managing of IT networks becomes more and more important, not only for large organizations, but also for medium-sized and small ones.
Topology changes of an IT network can be coped with by available network-discovery software tools which may run in an IT network as a, background job, collect information about the devices within the networks, and provide, as a result, a data representation of the network topology. If such a discovery software runs on a scheduled basis the data representation is automatically updated. For example, Hewlett-Packard offers such a network discovery software under the name “hp asset” which enables the discovery of network elements at the routing layer (e.g. routers) as well as on the switching layer (e.g. switches). A monitoring system with a discovery functionality is also disclosed in European patent application EP 1 118 952 A2.
Known network monitoring and management systems permanently monitor devices of a network to be monitored and provide alert messages to an operator if faults of network devices are detected. Typically, an alert message is generated if a network device which normally sends “alive” messages (either actively or upon request of a monitoring system) stops sending such alive messages, or if a device sends a message expressly indicating the occurrence of a fault.
Network monitoring systems often do not only issue one alert message in the case of a fault of a network device, but rather a number of related messages. For example, if a network router goes down, all devices beyond the router can no longer be reached. As a consequence, the monitoring system will not only issue messages indicating that the router is down, but will also output a large number of related messages that the devices beyond the router are not available. Due to the number of network devices and the complex interactions between them, it is difficult for the operator to resolve the dependencies of the generated messages and to find the origin of the problem. If the operator is flooded by messages, it may even be difficult to detect important alert messages and distinguish them from less important ones, so that there is a risk that the operator overlooks relevant messages.
In order to reduce the number of related messages, it has been proposed that event correlation filter techniques are used, for example in unexamined U.S. patent applications 2002/0138638 A1 and 2002/0169870 A1.
Typically (but not necessarily), non-public IT networks—also called intranets—are constructed using the technology of the Internet, which is a global public network implementing the TCP/IP protocol suite. (Regarding the meaning of the term “TCP/IP protocol suite”, see W. Richard Stevens: TCP/IP Illustrated, Volume 1, The Protocols, 1994, pages 1-2). Parts of the TCP/IP protocol suite are the Ping program (see Stevens, pages 85-96), the Traceroute program (see Stevens, pages 97-110 and SNMP, the Simple Network Management Protocol (see Stevens, pages 359-388).