The invention relates generally to the processing and management of alarms in communications networks, and more particularly to an alarm correlation method and system.
Any large telecommunications network is subject to occasional and/or frequent faults which result in alarms being raised. Finding the original cause of a particular fault can be an arduous task. Naturally, the time spent finding and fixing a fault depends on how the alarms occur and also on the level of experience of an assigned operator. If the fault is particularly complex then the resulting time loss can be significant.
In order to quickly diagnose a problem that occurs in a network, a network operator must be knowledgeable with respect to alarm reporting mechanisms, network element operations, and connection and configuration dependencies.
Even if the operator is experienced with the above, some difficulty in analysing network faults will still exist due to the manner by which alarms are reported in the network. For example, alarm flooding may occur in which case one fault causes many alarms to occur at once which can suddenly overwhelm the network operator. The network operator has to manually filter the alarm flooding-reports to find the direct failure alarm that is hidden in the alarm flood. In another example, referred to as Alarm Toggling (Alarm Streaming), alarms are constantly raising and clearing because of an intermittent fault. Related alarms to the fault can also toggle, and such alarm toggling may become confusing to the network operator. As an example, if the alarms are rapidly toggling (e.g. raising and clearing every second) the operator may have to take a snap-shot of the alarms at an instance in time to understand what may be happening in the network. If alarms are slowly toggling (e.g. raising and clearing every 5 minutes), the operator may miss a diagnosis if the alarm is currently in a clear state.
The biggest problem in network diagnosis deals with the time involved to locate a fault""s point of origin in the network. If. the network operator can quickly locate a failure, services can be restored quicker and chances are reduced that a small failure will develop into a bigger network problem.
In order to help the network operator view faults in a network, root-cause analysis systems have been developed. Some such systems may show an alarm correlation by presenting alarms that have been correlated into groups consisting of a direct detected alarm together with symptomatic alarm messages. This correlation greatly reduces the amount of time that the network operator has to spend in manually filtering the alarm messages. In addition, such systems may provide the customer with a view of problems found in the network. This will shift the network operator""s attention from viewing alarms to viewing problems in the network. Furthermore, some such systems are capable of providing a brief probable cause description of the problem and providing a reference that can be used to help identify the problem.
The correlation methods used in existing tools rely on an exhaustive search of the network to find symptomatic alarms for the root-cause alarm. This means that every alarm on every network element in the network is examined in order for correlation. This is very expensive in terms of computing power and execution time. To overcome this problem, such methods have limited themselves to certain types of alarms. From a flood of alarms, they select a certain type of alarms, reject the rest, and perform the exhaustive search for alarms of the selected type only. The selected alarms are usually the alarms raised at the line layer and these typically constitute roughly 20% of the total alarms. Although these limited correlation capabilities are practical for small networks, applying them would not be practical for larger more complicated networks.
It is an object of the invention to obviate or mitigate one or more of the above identified disadvantages.
The invention is composed of two elements, namely a network modelling scheme and a correlation process. The network modelling scheme models a set of network elements in a network as a hierarchy of TTPs (transport termination points) and creates several layers of connected TTPs. In the new correlation process, the network of connected TTPs is traversed once a root-cause alarm is raised and a problem object is created. A traversed TTP keeps its association with the problem object. In this manner a symptomatic alarm raised on the TTP is correlated with the associated problem(s) without the need for a repeated search of the network.
More specifically, in the new network modelling scheme provided by the invention, a network element is modelled as a hierarchy of virtual server-client TTPs. A TTP at a lower layer is served by a TTP at a higher layer. The whole network is then modelled by establishing connections between these TTPs. Since the TTPs are arranged in a hierarchy, the whole network will conform to a hierarchy. The connectivity of TTPs at the highest layer models the connectivity of network elements themselves. The connectivity of TTPs at lower layers represents the network at various topology/termination layers (e.g. optical, section, line, path, etc.). A network at a lower layer is served by a network at the higher layer. The alarms in the new model are considered to be raised on TTPs and not on network elements.
The correlation process is devised in harmony with the network modelling scheme. The correlation process determines a new alarm to be either a root-cause alarm or a symptomatic alarm. If it is a root-cause alarm then it is associated with a problem object with a generic attribute called correlation state. The correlation state of the problem is used to correlate symptomatic alarms to the problem. Once a problem (and hence the correlation state) is created on a TTP at a certain layer, the directly connected TTPs at the same layer and all the client TTPs at the lower layers served by the problem""s TTP are traversed in search of correlatable symptomatic alarms. On each traversal the symptomatic alarms on the TTPs are examined by an inference engine and added to the problem if correlatable. More generally, all the alarms on TTPs which satisfy certain predetermined criteria are considered for correlation. Any TTP traversed will keep its association with the correlation state. Therefore when a symptomatic alarm arrives later on that TTP, it is readily examined against any associated correlation state(s). This method of correlation alleviates the need for searching the network every time an alarm arrives. It greatly reduces the processing time of correlation since the majority of alarms are of symptomatic types and traversing of the network is only performed upon arrival of a root-cause alarm.
The correlation method presented here is technology independent and allows the correlation of alarms at all layers of the network with a significantly improved performance.
Using this method, the number of alarms considered for correlation is significantly increased due to participation of all the layers in the correlation process. The performance of the correlation process is also improved dramatically by eliminating the need for searching the network for a root-cause alarm when a symptomatic alarm is raised.