The Simple Network Management Protocol (SNMP) and Common Management Information Protocol (CMIP) are network management protocols that provide a generic mechanism by which different manufacturers' equipment can be monitored and controlled from a management system, such as a UNIX server. A network component on a managed network can be monitored and controlled using a management protocol to communicate management information between network components on the network. A network component includes networked personal computers, workstations, servers, routers, and bridges. There exist several key areas of network management including fault management, configuration management, security management, performance management, and accounting management. With the ability to instruct a network component to report events and the ability to start processes on a network component, the network an be manipulated to suit changing conditions within a system.
A key mechanism by which various network devices communicate with a management system is via SNMP traps or CMIP events. Hereafter "events" will be used to refer to either SNMP traps or CMIP events. Events allow for unsolicited notifications to be sent from one network device to another. This same mechanism can be used for communication between various cooperating software components within the management system. This form of communication is especially valuable when the information in the events might be of value to multiple consumers and the producer may be unaware of precisely who is interested in the information.
To facilitate the communication of these events between the various interested parties, a central broker may be provided which receives all such events from the producers and forwards them on to various parties that have registered their interest.
These events also have historical value when attempting to recreate and understand a problem scenario, typically in an effort to devise mechanisms to prevent such scenarios in the future. To this end, it is important to be able to log the events for later retrieval and then to be able to reconstruct the timeline of events.
In the prior art, events are received by a central broker referred to hereafter as the Postmaster Daemon (pmd). The pmd writes all events to a log file called trapd.log. The event data is written to trapd.log in a human readable ASCII textual fixed language format and contains a subset of the known information about an event. This trapd.log file is accessible by any event browser or other application that wishes to view or analyze the event data. The trapd.log file may be configured to grow until a maximum size, at which time the contents of the trapd.log file are moved to a backup file trapd.log.old and a new trapd.log file is generated for any new events. Since the event data written to trapd.log is written in an ASCII textual fixed language format, the event information cannot be easily reformatted into another language. Additionally, since trapd.log contains only a subset of the known information about the event, a significant portion of the data contained in the original event is lost when the event is stored. There is therefore an unmet need in the art to be able to store a complete representation of the original event information such that the entire event information may be retrieved. There is also an unmet need in the art to be store the original event information in a non-textual format that can be reformatted for presentation in a different local language.
Referring to FIG. 1, a network system 10 that utilizes a trapd.log file 14, according to the prior art, is shown. Postmaster Daemon 12 is responsible for receiving events from a managed environment or management applications and replicating the events simultaneously in ASCII text trapd.log file 14, event browsers 16, and other applications 17. Other applications 18 read the ASCII text trapd.log file 14.
A management activity may be interested in monitoring a subset of all of the events flowing through the Postmaster Daemon 12 which are in some way related to each other. Examples of such relatedness include the following: all of the events related to a particular application, device or network component; all of the events generated from devices of a particular manufacturer; and all of the events that were generated from a particular subset of the managed environment. The Postmaster Daemon 12 has the ability to allow the events flowing through the Postmaster Daemon 12 to be split into multiple groupings of related events hereafter referred to as streams of events, with each stream representing events related in some manner. Moreover, it is possible for a single event to be associated with more than one of these streams and for the ordering of events flowing through the multiple streams to vary from stream to stream. It is therefore important that the events flowing from each stream be recorded properly in the log file so that the flow from an individual stream can be reconstructed. However, the current trapd.log file is only suited to a single flow of events. There is therefore an unmet need in the art to be able to store event information from multiple streams such that the events are viewable and ordered by stream.
The Postmaster Daemon allows events to be correlated together. For instance, events indicating communication failures for several network components may collectively indicate a communication link has gone down. In the prior art, this correlation information is not recorded. There is therefore an unmet need in the art to be able to store event correlation relationship information.