Operations system monitoring represents a critical component of any large scale software system, including enterprise-level information technology systems. Monitoring the operations of such systems enables administrators to perform various diagnostic procedures, such as for example, determining whether the system is functioning properly and automatically initiating various repair procedures when the system is functioning improperly. The monitoring services and/or applications are complex and typically require the collection of numerous operational metrics, and the continuous aggregation, interpretation, and reporting of the collected operational metric data.
A major challenge in the design and implementation of such monitoring services is ensuring that the operational metrics being collected accurately identify operational issues within the system. Stated differently, the monitored operational metrics must accurately reflect the behaviors of the system and cannot falsely indicate that the system is behaving improperly when the system is actually behaving as intended. Monitoring services that are too sensitive, static and inflexible, and/or improperly configured, cause such errors.
It is with these concepts in mind, among others, that aspects of the present disclosure were conceived.