The present invention relates to communications networks. More particularly, and not by way of limitation, the present invention is directed to a system and method providing poll-based alarm handling in a communications network.
The management and control of the performance within a communications network are becoming increasingly complex. There are various factors which are attributed to this complexity, such as the increased complexity and diversity of the technologies implemented in a network, the spread of highly advanced services with distinct requirements and heightened expectations of the users being served.
Within these complex networks, a single network fault may generate a large number of alarms over space and time. In large, complex networks, simultaneous network faults may occur, causing the network operator to be flooded with a high volume of alarms. The high volume of alarms greatly inhibits the ability to identify and locate the responsible network faults.
In order to mitigate the high volume of alarms, existing fault management systems correlates events into alarms. These existing systems reduce the amount of alarms by attaching the events to an existing alarm if they belong to the same flow or have the same key. In these systems, all alarms reach the Network Elements (NEs) since the network alarms are all correlated at these lower levels. An example of an existing system often referred as “sympathetic alarms” is disclosed in U.S. Patent Application Publication Number 2004/0223461 to Scrandis et al. International Publication Number WO 00/25527 to Tse et al. also discloses an alarm aggregation method.
In addition, there are various existing systems which provide even more advanced event correlation processes, but require the collection of all events and alarms for the correlation process to run. GB 2318479A1 to Niall discloses a knowledge based alarm correlation system. European Patent Publication Number EP 0 549 937 A1 to Bouloutas discloses correlating alarms even if they may hold unreliable or missing information.
Several existing fault management systems also distribute management tasks closer to the network elements in order to reduce the amount of alarm messages. Event correlation on the distributed nodes can be done for locally emitted alarms and only a subset of events is needed to be propagated upwards to the central management system. This method is effective to suppress alarms that are taken from the point of view of the distributed management node. However, these fault management systems are not effective in suppressing alarms if the connection of alarms requires a network view that spans several nodes or domains. U.S. Pat. No. 6,665,262 to Lindskog et al. discloses a management system which collects alarms on a domain level and also performs solutions on the domain level. Any inter-domain problems are propagated upwards in such a system. U.S. Pat. No. 6,000,046 to Passmore discloses a multi-layer system that also correlates events on multiple layers from a bottom-level upward and only propagates alarms that cannot be correlated within the domain.
U.S. Pat. No. 5,949,759 to Cretegny (Cretegny) discloses the suppression of logical alarms and stores these alarms in the network elements. Only physical alarms are sent to the access nodes with topology and correlation information. The access nodes then send the physical alarm to the management system which accesses the logical alarms on-demand using a correlation key.
As discussed above, in a fault situation, the amount of alarms may be very large and difficult to process. Many solutions filter and correlate alarms on the network level which disadvantageously requires sending a large amount of alarms to the central node. This may be similar in effect to a network storm attack and could cause adverse effects on the network. In some existing solutions, the number of alarms is limited by placing the correlation logic closer to the network elements, but such devices are limited because they cannot correlate events when the problem spans several distributed domains. In such cases, the alarms have to be sent to the central node. In modern telecommunication networks, the evaluation of the severity of an alarm is typically hard to conduct below the network layer.
U.S. Pat. No. 5,949,759 to Cretegny highlights the problem of sending too many alarms. Cretegny discloses first discovering correlation keys and then suppressing the transmission of logical alarms. While this solution is effective in suppressing related alarms, it is still based on a bottom-up approach, because a low-level physical alarm needs to trigger the alarm correlation process. Furthermore, low-level physical alarms are not equal from the service or business perspective. For example, on the network element level, an alarm cannot be easily categorized unless it is severe.
All of the existing fault management systems perform a bottom-up approach where alarms are propagated and aggregated from the network elements toward the central management node. The common limitation of this bottom-up approach is that the high-priority problems, such as non-functioning service which typically appears on the network level and lower layer alarms may not hold sufficient information in order to tell whether an alarm is actually important. It is only on the network level where such correlation is possible.