An optical network is subject to intermittent faults that may raise alarms in the system. A single fault in the system can however give rise to multiple alarms detected at multiple points in the network. Finding the root cause alarm corresponding to the fault that has triggered these alarms is important for fault isolation and repair.
In the absence of an automatic fault isolation system, the network operator has to manually go through the list of alarms and identify the root cause alarm triggered by a fault that needs to be alleviated. This can be a long and arduous task in large networks. It cannot only overwhelm even an experienced network operator but can also increase the time for the detection of the failure. This in turn can significantly increase the time required for returning service to the network.
Alarm correlation has been addressed by prior art. U.S. Pat. No. 6,707,795 B1 to Noorhooseini et al. issued Mar. 16, 2004, which describes an alarm correlation method for use in a network management device. Using a hierarchical network model, the method performs a correlation between the root cause alarm and other alarms raised by network elements that satisfy particular relationships with the network element that produced the root cause alarm.
Another method and apparatus for incremental alarm correlation is described in the U.S. Pat. No. 6,604,208 B1 to Gosselin et al. issued Aug. 5, 2003. The method partitions the alarms into correlation sets in such a way that the alarms within a set have a high probability of being caused by the same network fault.
Partitioning of alarms is also performed by an invention described in the U.S. Pat. No. 6,253,339 B1 to Tse et al. issued Jun. 26, 2001. This patent provides a method and system for correlating alarms for a number of network elements. The system uses an alarm correlator that partitions the alarms into correlated alarm clusters. The clusters are constructed in such a way that the alarms in a given cluster have a high probability of being caused by the same network fault.
A method for processing data such as alarms concerns U.S. Pat. No. 6,356,885 B2 to Ross et al. issued Mar. 12, 2002. The method performs alarm correlation for a set of managed units. When one of the managed units is notified of an event such as an alarm, the cause of an alarm is determined by using a virtual model. The model comprises the managed units corresponding to the network entities. Each unit contains information about the services offered and received by its entity to and from other entities. A unit uses its knowledge-based reasoning capacity for adapting the model by using this information.
Yet another method and apparatus for fault correlation in a networking system is described in U.S. Pat. No. 6,006,016 to Faigon et al. issued Dec. 21, 1999. In this patent, occurrences of faults are detected and correlated by using a set of rules that are based on the number of times a specific fault event is generated during a time threshold.
A number of algorithms for alarm correlation and the determination of the possible location of faults in a large communication network is presented in U.S. Pat. No. 5,309,448 to Bouloutas et al. issued May 3, 1994. The techniques described in this patent differ in the degree of accuracy in fault location and in their algorithmic complexity.
Fault correlation in packet switched networks is considered in U.S. Pat. No. 5,949,759 to Cretegny et al. issued Sep. 7, 1999. It describes a method that registers a failure in a high-speed packet switched network such that the failure information can be retrieved by the network management system.
Notification of faults and load balancing of the data traffic among multiple paths in an overlay mesh network is described in U.S. Pat. No. 6,725,401 B1 to Lindhorst-Ko issued Apr. 20, 2004.
The above cited prior art indicates that there have been multiple attempts to solve the problem of identifying faults but there is still a need in the industry for further developments of an efficient method and system for identifying and isolating faults in the network.