A system and method for counting events of plurality of managed network elements (NEs) in a large communication network may be summarized as below.
Network events are collected by event collectors (ECs) from event reporters (ERs). An EC then partitions the events into clusters based on partition criteria. The EC counts the number of events in each cluster and reports the outcome based on a so called report notification trigger criteria. Receivers of the reports may query the EC for the details of the events, if it is in their interest to do so.
A communication network consists of nodes and links. These are subject to network management and are called network elements (NEs) in this context. A large communication network consists of thousands of such NEs for the purpose of transporting information among users/subscribers of the network. These NEs may have faults and the faults are reported as alarms. In the context of this application, these NEs having alarm condition are termed event reporters (ERs).
An alarm notification carries many information parameters. For example, it carries the identity of the NE in alarmed state, the severity of the alarm, the probable cause of the alarm, the time of the alarm condition, the suggested remedy action, etc. In 3GPP/3GPP2, the alarm notification contains some 20 parameters.
In a communication network, there are nodes or systems that are not responsible for transporting information for user/subscribers of the network. These nodes or systems, e.g. an element manager (EM) and DM of FIG. 2, manage the network. They install, configure and supervise the nodes and links. They monitor the network and if the network performance falls below some planned threshold, the network managers then initiate a recovery plan involving, for example, reconfiguring the nodes, reconfigure user call routes, deactivate faulty nodes and activate backup nodes etc. In the context of this paper, these nodes are termed event collectors (ECs).
The network management architecture, i.e. organization of the DMs and NEs, and the distribution of network management functionalities among them, protocols and network management services, e.g. fault management services, configuration management services such as the FCAPS network management services defined by ITU-T, are subject of various international standardization bodies and organizations such as ITU-T, 3GPP, 3GPP2 and IETF. All these network management architectures have one basic principle in that the entities are organized in a hierarchy as shown in FIG. 2.
With reference to FIG. 2, the network elements (NEs) generate alarms and these are collected by their respective domain managers (DMs). The network managers (NMs) collect the alarms from several DMs. Each DM may collect alarms from several thousands of NEs. Given that each NE may generate hundreds or thousands of alarms per day, there is a large propagation of data from NE upwards. The information contained in the alarms at NE level is used to identify the faulty equipment or function, and to identify the appropriate remedial actions to be taken by an operator to correct the fault condition. This information is used by systems or operators at the NE and DM level, and occasionally at the NM level.
Today, there exists an operation in the 3GPP Alarm IRP to request summary information for all or part of the network being managed by an Alarm IRP Agent, namely the operation getAlarmCount( ) 3GPP2 has defined a similar function.
There also exists an operation in the 3GPP Alarm IRP to request all the current detailed alarm information for all or part of the network being managed by an Alarm IRP Agent, namely getAlarmList( ). 3GPP2 has defined a similar function.
There further exists an operation in the 3GPP Notification IRP to request all new and changed alarm information for all or part of the network being managed by an Alarm IRP Agent, namely subscribe( ). 3GPP2 has defined a similar function. Similar notification exists in ITU-T Recommendation, ITU-T X.734 Recommendation on Event Report Management Function.
This prior-art paradigm of today of transporting alarm information has limitations and problems.
The prior-art paradigm uses the “publish-subscribe” paradigm. ER “publishes” the detail information regardless whether there are subscribers wanting it or not. Subscribers subscribe for reception of the information. The transfer of detailed information always happens, filling up the channel, regardless if there are any subscribers wanting it or not. Two problems exist. Firstly, the channel capacity may be used up for no reason, i.e. no subscribers wanting the information. Secondly, in the case a subscriber does not want the information, the subscriber process needs to filter/discard the information or the ER, reporting to the subject subscriber, needs to filter/discard the information. Thus, subscriber CPU or ER CPU cycles are wasted.
At each aggregation point, additional network load is incurred to send the detailed alarm information to the next level in the information-flow chain or hierarchy. For example, the communication channel capacity between the “higher level” and “lower level” is used for transporting alarms in competition with other needs such as transporting configuration management related data and performance management related data. Detailed alarm information is not always required, and summary information is often sufficient.
At each aggregation point, for example at DM, additional database/memory load is incurred to store the detailed alarm information. Large database/memory means longer time, for example, for the user of DM to search and retrieve relevant alarm data when wanted. Detailed alarm information is not always required, and summary information is often sufficient.
There is today no method to subscribe to summary alarm information from all or part of the network being managed by a DM.
With growing networks sizes and network and network element complexity, it is increasingly difficult to maintain detailed network alarm information at one level so that it is in sync and in real time with alarm information at a lower level.
Even if one has solved the synchronization problem mentioned above with acceptable level of reliability and performance, to maintain detailed network alarm information at DM in real time for the purpose of “just in case a DM user needs the information” is an expensive proposition, and is contrary to “just-in-time-inventory concept”. Its implementation will use up critical DM-NE channel capacity that is shared for transportation of other types of information such as configuration management and performance management information.
Some NE's are actively carrying traffic in the network, while others are in the process of being commissioned or de-commissioned. For the NE's whose role is less critical in the network, and which may not be fully configured, it may not be relevant for the DM to receive and store all detailed alarm information. However, a summary of the alarm situation may be required, such that if a large change in the alarm volume occurs, or if some alarms of high severity occur, management intervention by the DM may still be required.
The communication facilities used between a) NMs and DMs and b) DMs and NEs may be shared, not dedicated resources. It is uneconomical to dimension a communication facility that can handle alarm storms, or peak alarm rate, since such facilities will not be filled to capacity at all times. A summary of alarm situation would reduce the potentially large volume of alarm notification emissions so that network operator needs not dimension its communication facility to handle peak alarm rate.
The communication facilities used between a) NMs and DMs and b) DMs and NEs may be provided by Internet, i.e. a public facility and one that is not dedicated for use by the operators of NMs, DMs and NEs. This is the case of when the NEs are home devices such as 3GPP Home eNB, TV set-top boxes, etc. It is virtually impossible to dimension that communication facility to handle alarm storms, or peak alarm rate. A summary of alarm situation would reduce the potentially large volume of alarm notification emissions so that network operator may avoid this dimensioning problem.
Also, equal priority is given to major and minor alarm raising, and also equal priority between alarm ceasing and raising, such that in a period of high alarm volume, the true nature of the alarm status may take some time to report, receive, and process the aforementioned 20 parameters of an alarm. Summary information can describe in one notification what might otherwise take hundreds of notifications.
Thus, there is a need for an improved method and arrangement for conveying network event information such as network alarm management information, of a large communication network, which overcome at least some of the problems and drawbacks mentioned above.