In a telecom network, multiple base stations provide services to end users on mobile terminals. The base stations can also be partitioned and each partitioned group of base stations are managed by a network management system. FIG. 1A illustrates a simplified network 100 that includes base stations 110-1, 110-2 and a network management system 120 (or simply “NMS”). The base stations 110-1, 110-2 provide wireless communications services to end users on mobile terminals 130-1, 130-2 and 130-3. The base stations 110-1, 110-2 are connected to a core network (not shown) so that voice and data services (such as VoIP and multimedia streaming) can be provided to the mobile terminals 130-1, 130-2 and 130-3.
During operation, many events are generated and processed in the network 100. An event is a generic term for any type of occurrence that occurs within a network entity such as the base station. For example, when the mobile terminal 130-1 enters a service area of the base station 110-1 and a communication link is established between the mobile terminal 130-1 and the base station 110-1, a communication-synchronization event can be generated to mark the process that establishes the communication link. The communication-synchronization event information can include, among others, the identity of the mobile terminal 130-1 and the time when the communication link was established. This information can be used for billing purposes and also to gather statistics for analysis.
An important category of events is events caused by fault states, i.e., abnormal conditions existing in the network. The state that triggers the event may be temporary in a sense that the condition that caused the event ceases to exist without any intervention. For example, the base station's 110-1 capacity may be exceeded when there are too many mobile terminals requesting connection services. As a result, a capacity-exceeded event may be triggered by the base station 110-1. However, as the mobile terminals 130 leave the area served by the base station 110-1, the demand for services will fall below the capacity threshold.
Some fault states may automatically be corrected or at least automatically mitigated. As an example, the base station 110-1 may provide VoIP services through two boards—the primary board and the backup board —each capable of handling the VoIP data traffic. If the primary board stops functioning, an event related to the failure of the primary board is triggered. However, because of the redundancy provided by the backup board, the VoIP service can be restored automatically. In another situation, both boards may be used to provide the VoIP services. When one board fails, the VoIP services can still be provided, but at a reduced capacity—50% in this instance.
In extreme circumstances, resolving the faulty states require manual intervention. For example, if there is only a single board providing the VoIP services and the board fails (board-failure event), the base station 110-1 can no longer provide the service until the board is replaced or repaired.
When the event is generated and/or detected by the base stations 110-1 and/or 110-2, the event is filtered. Filtering is a process in which a decision is made on whether to raise an alarm corresponding to the event or not. In the example above where the base station 110-1 generates the capacity-exceeded event due to too many mobile terminals requesting services, the condition may last only a short time such as 30 seconds. In this instance, the base station 110-1 may decide not to raise an alarm to the next level—that is the base station 110-1 may decide not to notify the network management system 120.
However, if the condition lasts a significant amount of time such as over 5 minutes, the base station 110-1 may raise the alarm to the network management system 120 so that load balancing procedures may be carried out to establish an acceptable service level for the network. If the event indicates a complete service disruption, then the base station 110-1 raises an alarm to the network management system 120. The network management system 120 in turn may automatically notify a technician so that the situation can be investigated and corrected as necessary.
To process the alarms and events, the base stations as well as the network management system include alarm and event handling functions as illustrated in FIG. 1B, which provides a functional view of the network 100 illustrated in FIG. 1A. In FIG. 1B, the base stations 110-1 and 110-2 includes RBS event handlers 115-1 and 115-2, respectively, for performing the event handling functions. The network management system 120 includes a NMS alarm handler 125. Each RBS event handler 115-1, 115-2 is connected to the NMS alarm handler 125. The mobile terminals are not illustrated in FIG. 1B to minimize clutter so that understandability is enhanced.
Focusing on the RBS event handler 115-1, when an event occurs, the RBS event handler 115-1 decides whether an alarm should be raised to the network management level. If so, the RBS event handler 115-1 raises the alarm by notifying the NMS alarm handler 125 pointing out the malfunctioning subject.
FIG. 1C illustrates a conventional method M100 of handling events performed by the RBS event handler 115-1. In method M100, the RBS event handler 115-1 detects an event in act A110. In act A120, the RBS event handler 115-1 determines if the event is severe enough to be raised as an alarm to the next level—i.e., the event is filtered. If so, then the RBS event handler 115-1 raises an alarm corresponding to the event to the network management system 120 in act A130.
The reason that conventional event handlers such as the RBS event handler 115-1 filters events is explained as follows. In conventional networks, supervision of nodes is performed from centralized operational centers. The network management system 120 illustrated in FIG. 1A is one such operational center. When an event occurs that can have impact on the performance of the network, a network node (e.g., the base station) raise an alarm to notify the operational center. A typical network has many nodes and each node may detect many events. Since the number of events can be substantial, each node filters the events so an operator working at the operational center is not inundated with insubstantial or low priority events that may take away focus from more severe faults.
The conventional event handlers perform satisfactorily to filter the events on a node by node level. However, the conventional event handlers are inadequate in that they over-filter events that should properly be raised as alarms to the operational centers. An event that is individually trivial or low priority to a single node can sum up to indicate a severe fault requiring attention if the event occurs across multiple nodes in an area. As an example, assume that both base stations 110-1 and 110-2 illustrated in FIG. 1A are each outfitted with five boards providing VoIP services. Also assume that one VoIP board from each base station 110-1, 110-2 restarts so that the VoIP service capacity momentarily is reduced to 80% for each. Since each base station 110-1, 110-2 can still provide the VoIP services, each event is not raised to the network management system 120.
However, it may be that both boards subject to the restart are from a particular product line of a vendor and the restart reasons are due to flaws that are particular to that product line. The flaw may be in the board's firmware, on-board processor version, thermal tolerance, etc. If such information is known, then preventive actions may be taken (not installing the same type of hardware boards to other base stations) and the product vendor may be notified so that the issues with the boards are addressed. But since the conventional event handler withholds the information (with good intentions), it is difficult to analyze the situation and to take corrective actions.
As another example, a base station may be subjected to multiple failed attach requests when a mobile terminal tries to register itself to the mobile network. This normally will not cause the base station to raise an alarm, as this is expected from time to time due to mobile terminals trying to attach under poor radio environmental circumstances. However, if multiple neighbors experience the same multiple failed attach requests, this common experience can indicate a malfunctioning mobile terminal or an environmental disturbance that needs to be addressed. But again, the conventional event handler withholds the information and corrective actions are not taken as a result.