This invention relates to managing network faults.
Proper detection, reporting, and interpretation of faults are important activities in keeping networks working properly. As shown in FIG. 1, a network management station 10, can track fault notifications to help operators identify failure conditions within the network elements 12, 14, 16, 18, such as routers, switches, radio nodes, and radio network controllers. When a fault (i.e., an event that adversely affects the proper functioning of the network element) or other noteworthy event occurs at a network element, the element notifies the management station by “sending a trap” 20. In a large network, one management station may serve hundreds or even thousands of network elements.
A fault in one entity 11 of a network element may trigger cascading faults in other components 13, 15, 17 that rely on the faulty component for proper functioning. If a network element suffers a bout of cascading faults and sends a trap to the management station for each fault, the avalanche of traps may overload the management station. The operator 22 at the management station may also be overloaded by the amount of information carried in the traps and consequently be distracted from the key traps that report the root causes of the faults.
One way to reduce the number of traps processed at a management station is by “filtering” (i.e., purposefully ignoring traps that satisfy certain simple criteria). Alternatively, operators may instruct network elements not to send traps for certain classes of faults. However, the operator may inadvertently filter out traps that report the root causes of faults.
Other approaches, which run on management stations, examine previously logged fault records to correlate faults, apply data-mining techniques on logged faults to identify patterns that may point to the root causes of faults, and/or use expert-system techniques combined with externally specified rules to correlate logged faults.