This application relates generally to the field of network management. More specifically, the disclosure provided herein relates to the management of fault events generated by devices on the network based on a statistical correlation of events.
A critical challenge for network management systems is dealing with the massive amount of information related to network problems or faults that may be generated in a large, complex network. In many cases, this information arrives in a stream of “traps” or fault events from the myriad of devices that make up the network infrastructure. To manage the network efficiently, it is essential to be able to discriminate between the events that are important enough to present to network operations personnel in the form of trouble tickets for diagnosis of the network fault, and the events that are redundant and can be safely discarded.
The important or significant events are those that may indicate a root cause of the network fault, while the redundant events are merely secondary results of the root cause and thus conditioned on the significant event. However, under certain network conditions, redundant events may become significant events and vice versa. Thus, determining which fault events are significant and which are redundant normally requires a detailed knowledge and thorough analysis of the topology of the network and the types of interconnected devices from which it is constructed.