This disclosure relates generally to the field of system management and troubleshooting. More specifically, the disclosure provided herein relates to strategies for reducing the number of alarms requiring investigation in a production network environment or other complex system.
A major cost driver in the operation of a large, complex system of networked devices or components is having sufficient support personnel to address the large number of problems or faults that may occur in such as system. In many cases, these problems must be identified by analyzing a stream of “alarms” or fault events that are generated by the myriad of devices and components that make up the system infrastructure. To manage the system efficiently, a strategy may be employed to reduce the total number of alarms that must be presented to support personnel for diagnosis and troubleshooting.
One element of such an alarm reduction strategy may be to identify and reduce redundant alarms, or those alarms having the same root cause. This allows support personnel to concentrate on solving the problem rather than spend time investigating duplicate notifications. However, identifying redundant alarms normally requires a detailed knowledge and thorough analysis of the types of interconnected devices and components from which the system is constructed.