1. Field of the Invention
The present invention relates to a method and system for managing a fault alarm storm in a network communication system, and more particularly to a system and method for managing a fault alarm storm by identifying the alarms and processing the alarms to maintain network performance.
2. Brief Description of the Related Art
Network services, such as IPTV, VOIP, and high speed internet, require high performance network equipment in complex networks. Outages can occur due to physical problems and/or logical errors. Planned maintenance and failures significantly impact customers with lengthy downtimes as network operators upgrade software. Due to interconnections between network components, when one component fails, many elements may be affected. Therefore, one failure can lead to multiple alarms being generated. In particular, a burst volume of alarms may be generated due to device hardware/software failures or network-wide communication breakdowns.
For example, when a network encounters an abnormal situation, such as multiple cable cuts, a network management system may be overwhelmed with alarms. The excessive alarms may cause the upstream alarming and ticketing systems central processing unit to create a bottleneck that will impact network center operations. When the failures and the resulting alarms occur in large quantities this results in what is known as an alarm storm. The alarm storm may be so severe that the processing power needed to process the alarms outstrips the processing capacity of the network and network performance is severely degraded. In extreme cases, the entire network operations may crash due to an alarm storm.
Alarm handling systems that are known in the art typically try to process the alarms as fast as they can and eventually run out of processing capacity or memory. Some of the prior art systems try to correlate all the alarms. But because so many alarms come in so fast, the system may run out of power and desired results cannot be generated in time for trouble shooting. One way to handle this problem is to upgrade to more powerful machines. However, this can be expensive and may provide only a short term solution to the problem.
Accordingly, it would be desirable to have an alarm fault management system with the ability to detect an alarm storm before a network management system is impacted, thereby increasing the capacity of a fault management system and maintaining overall network performance.