1. Field of the Invention
Embodiments of the disclosure relate in general to the field of computers and similar technologies, and in particular to software utilized in this field. Still more particularly, it relates to the automated management of threshold crossing alarms.
2. Description of the Related Art
Threshold alarms are commonly used in resource management systems to ensure that resources are operating at their optimum capacity. Setting threshold crossing alarms is simple. Setting them correctly and effectively often proves to be much more difficult. If they are set too low, alarms are created even though there is no problem. If they are set too high, there may be a problem long before the threshold is reached. Thresholds may also be dependent upon individual system components, and in certain cases, even the current release of operating system and application software running on the system. Another issue is architectural design. As an example, some computers only use memory as they need it. Others seize all available memory and then allocates it to processes as each process requests.
In addition, threshold alarms are useful for capacity planning. For instance, if utilization on a particular Wide Area Network (WAN) interface consistently exceeds 75%, then it might be a good time to increase the bandwidth or decrease the amount of traffic crossing the interface. More commonly, threshold alarms are used for fault management and fault isolation. For instance, if the CPU utilization of a router consistently exceeds 90%, yet there is no recognized processing pattern, there is most likely a problem with the router.
Furthermore, modern day distributed application platforms are often distributed in clusters for redundancy or performance. Today, it is not unusual for a single application instance to span tens of servers. As a result, accurately setting thresholds simultaneously becomes more complex, yet more necessary.
In view of the foregoing it will be apparent that thresholds can play a key role in quickly isolating system faults when they are properly set. When not, a flood of events and alarms may be generated, masking the underlying system issues or processing faults. There are various statistical approaches to dealing with the threshold setting issues, but such approaches are static and fail to dynamically adjust to changes in available resources and transaction processing volumes. More commonly, the setting of threshold levels is performed manually, which is tedious, time consuming, and error-prone.