1. Field of the Invention
The present invention relates generally to an apparatus and method for the management of a network, and more particularly to a network management apparatus and method which monitors a network, and generates Events when certain types of conditions are detected.
2. Description of the Related Art
The following description is concerned with a data communications network, and in particular a local area network (LAN). It will be appreciated, however, that the invention but has more widespread applicability to other managed communications systems including wide area networks (WANs) or wireless communications systems. Networks typically comprise a plurality of computers, peripherals and other electronic devices capable of communicating with each other by sending and receiving data packets in accordance with a predefined network protocol. Each computer or other device on the network is connected by a port to the network media, which in the case of a LAN network may be coaxial cable, twisted pair cable or fibre optic cable. A network is generally configured with core devices having a plurality of ports, which can be used to interconnect a plurality of media links on the network. Such devices include hubs, routers and switches which pass data packets received at one port to one or more of its other ports, depending upon the type of device. Such core devices can be managed or unmanaged.
A managed device is capable of monitoring data packets passing through its ports and obtaining data relevant for network management. Managed devices additionally have the capability of communicating this data using a management protocol such as the SNMP (Simple Network Management Protocol), as described in more detail below. The skilled person will appreciate that the invention is not limited to use with SNMP, but can be applied to managed networks using other network management protocols.
SNMP defines agents, managers and MIBs (where MIB is Management Information Base), as well as various predefined messages and commands for data communication. An agent is present in each managed network device and stores management data and responds to requests from the manager. A manager is present within the network management station of a network and automatically interrogates the agents of managed devices on the network using various SNMP commands, to obtain information suitable for use by the network administrator, whose function is described below. A MIB is a managed “object” database which stores management data obtained by managed devices, and is accessible to agents for network management applications.
It is becoming increasingly common for an individual, called the “network administrator”, to be responsible for network management, and his or her computer system or workstation is typically designated the network management station. The network management station incorporates the manager, as defined in the SNMP protocol, i.e. the necessary hardware, and network management software applications to retrieve data from MIBs by sending standard SNMP requests to the agents of managed devices on the network.
A part of the network administrator's function is to identify and resolve problems occurring on the network, such as device or link malfunction or failure. In order to provide the network administrator with the necessary information to identify such problems, the network management application monitors the devices on the network. An example of such monitoring is described in co pending UK Patent Application No 9917993.9 entitled “Management System and Method for Monitoring Stress in a Network” in the name of the present applicant. In the system and method described in UK Patent Application No 9917993.9 the SNMP manager in the network management station requests the agents of managed network devices on the network to retrieve selected MIB data indicative of device and link operation, and performs tests for device activity and service availability. Such MIB data may relate to characteristics such as traffic activity or errors occurring at a particular port in the relevant network device. Tests may include sending ICMP Ping requests to each device on the network, or sending selected requests for services such as SMTP, NFS and DNS to servers, and monitoring the time taken to receive a response. The monitored parameters or characteristics are referred to herein as “stress metrics”.
The network management application compares, for each stress metric, the retrieved data or test results against a corresponding threshold level for the stress metric. The threshold level is the level above which performance is considered to be unacceptable.
Each time a threshold is exceeded, the application generates and logs an “event” in memory. An “event log” stores each event, and includes information such as the date and time of the event, the identity of the device affected and the nature of the event. The event list thus provides a history of events which have occurred on the network, and the network administrator can review the event list to identify problems on the network.
In addition to events resulting from the monitoring of stress metrics, events may also be generated by the network management application when other types of condition are detected. For example, a network management application may receive an asynchronous Trap, for example an SNMP Trap from a managed network device. An SNMP Trap is automatically sent by an SNMP agent to the SNMP manager when certain conditions are detected by the agent in the managed device. Examples of conditions which cause SNMP Traps to be sent include “link up” and “link down”. When an SNMP Trap is received by the network management station, the management application may log an event.
An example of a known network management software application capable of determining monitoring the stress of a network is the 3Com® Network Supervisor available from 3Com Corporation of Santa Clara, Calif. U.S.A. This application, and similar applications, uses SNMP commands to retrieve relevant management data from managed network devices, and processes the data as described below.
The event log is the main source of information used by the network administrator in order to identify problems on the network. Accordingly, it will be appreciated that the manner of presentation of events to the network administrator in the event list is important. The network administrator needs to be able to identify problems easily and without having to review a long list of insignificant events.
Some network management applications present each event in the event log with a “severity” indication. The severity indication is dependent on the nature of the event and other factors such as the degree to which a stress metric threshold is exceeded. For example, if the threshold for a stress metric is exceeded by a small amount, the severity indication may be “High”, if the amount by which the threshold is exceeded is more significant the severity indication may be “Warning” and if the threshold is exceeded by even larger amounts the indication may be “Critical”.
Thus, the severity indication enables the network administrator to determine the events which need the most urgent attention. However, this does not address the problem of large numbers of events appearing in the event log.
Another problem encountered by the network administrator in reviewing events in the event log is associated with events generated as a result of an intermittent or “recurring” problem on the network. For example, a problem, such as congestion on a particular link, may occur at certain times of heavy network traffic throughout the day. Each time the link becomes congested, having previously been operating normally, an event is generated. This leads to the event list displaying a large number of identical/equivalent events showing congestion on the link, interspersed between other unconnected events. This can make it difficult for the network administrator to identify and determine that the events indicating congestion on the link are indicative of a single recurring network problem (i.e. a recurring problem on a specific, single network device or link). In addition, the inclusion of a separate event on each occurrence of a recurring problem may obscure other unrelated, yet more significant events.
The present invention seeks to address these problems.
In the aforementioned co-pending U.S. Patent Application entitled “Processing Network Events to Reduce the Number of Events to be Displayed” filed simultaneously herewith, there is described a method and apparatus in which events generated by a network management system are passed through one or more “event processors” to correlate events prior to presentation in an event list. Each processor is adapted to correlate certain types of events which may be generated as a result of certain conditions or problems on the network. This correlation ensures that the number of events presented in the event list is reduced, by avoiding presenting certain types of event which, generally speaking, are less informative about network conditions and problems to the user.