In network systems, announcements (e.g., alarms) about abnormal functioning of the system or about failure are typically transmitted or signaled to the network management system. Network systems, like communication networks, typically comprise a plurality of network elements and further the network elements comprise a plurality of physical and logical devices/resources/functions. These are called “managed objects” in the context of network management. The network management system may comprise one or more network management devices.
As an example, the International Telecommunication Union (ITU) has developed a standard X.733 about an alarm reporting function which provides a user, e.g., an operator, with the ability to transmit and clear alarms. Alarms are events that indicate changes in networking or system environment, which is of concern to the network management.
Based on the above standard X.733, the Third Generation Partnership Project (3GPP) has developed a set of technical specifications (TS32.111-x) for fault management (FM) in 3G systems. In addition to detecting failures in the network system, and reporting them, the fault management includes associated features in an operations system (OS), such as the administration of alarm list, the presentation of operational state information of physical and logical devices/resources/functions, and the provision and analysis of the alarm and state history of the network. TS 32.111-2 V6.8.0 defines an Alarm Integration Reference Point (IRP) Information Service (IS), which addresses the alarm surveillance aspects of FM. The purpose of the Alarm IRP is to define an interface through which a “system” (typically a network element manager or a network element) can communicate alarm information relating to its managed objects to one or several manager systems (typically network management systems). The Alarm IRP IS defines the semantics of alarms and the interactions visible across the reference point in a protocol neutral way, and the semantics of the operations and notifications visible in the IRP.
A wide communication network such as a nation wide mobile access network can create a huge number of alarms from its thousands of network elements comprising numerous network element computer units and logical resource instances (managed objects). Alarms are created for example by autonomous self-check circuits and procedures within the network element, or by an element/network manager, and reported to the network management system by sending alarm notifications. A root cause for a fault state can be for example a software problem, a hardware failure, an erroneous operator action or sometimes even a radical change in the environment, like rain or fog hindering signal transmission to a receiver, excessive temperature causing processor errors, external RF noise hindering signal receipt at a receiver, humidity causing electrical leakages, etc.
The network management device(s) of a network management system typically provide tools for monitoring (visualizing) alarms which are received from the network. FIG. 6 illustrates an example of such an alarm monitoring tool. For a network operator, managing the network, it is difficult to prioritize corrective actions for failures due to huge amount of alarms (which can be thousands per day) as well as the amount of network entity instances to which the alarms relate to. Alarms may indicate same relative fault severity (e.g., critical, major, minor), although they are detected in network entity instances which are not equally critical from a network operation point of view. Also the importance of the network entity instance may vary for example daily or hourly (e.g., during busy hours).
The network may report also its performance to the network management system. Typically measurement reports are received in network management system periodically, for example once per quarter/hour/day, i.e., in a non-real time manner. Reports contain a large amount of indicators which give information of the performance of the network, network element and its resources. The amount of measurement data received daily in the network management system can be huge (that is hundreds of Mega/Gigabytes per day).
An operator's network management system may comprise a post-processing tool for analyzing the alarms and their correlations to each other, and thus trying to find out the root cause of the failures. Possibly the post-processing analysis take into account network performance data and other parameters also, like network structure. The post-processing analysis may help operator to prioritize corrective actions (e.g., in which order the faults in NEs should be corrected), but the analysis takes time and requires a lot of processing capacity because of the huge amount of received (and stored) data in the network management system. Also the result of such analysis cannot be in real-time, because significant part of the data in the network management system is received in non-real time.