This invention relates to a method and apparatus for network fault management in busy periods.
When a fault occurs in a network it can be difficult for administrators and management to understand the relevant network conditions at the moment of the fault because of the large numbers of factors that are involved. For instance, each network device has multiple hierarchical relationships with other network devices and all the relationships are potentially relevant to the fault. Furthermore each point in the network has potentially a different set of network conditions. Analysis of a single fault by a fault management system can take input from tens or hundreds or more factors and this approach has limitations when scaled further.
Network performance management systems evaluate large numbers of complex network conditions by consolidating network events and by interrogating network devices directly for performance factors. Network performance management systems use consolidated indicators and directly acquired performance factors to calculate critical periods corresponding to maximum demand on a network or clusters of devices within a network. Typically network performance management takes a macro or top down approach to network management whereas fault management takes a micro or bottom up approach. Network performance management is concerned with maximising the efficient throughput of network traffic whereas network fault management is concerned with understanding which individual events are relevant to problems in the network. Therefore, there is a need in the art to address the aforementioned problem.