Operations system monitoring represents a critical component of any large scale software system, including enterprise-level information technology systems. Monitoring the operations of such systems enables administrators to perform various diagnostic procedures, such as determining whether the system is functioning properly and automatically initiating various repair procedures when the system is functioning improperly. Monitoring is complex and typically requires the collection of numerous operational metrics, and the continuous aggregation, interpretation, and reporting of the collected operational metric data.
A major challenge in the design and implementation of such monitoring services is ensuring that the operational metrics being collected accurately identify operational issues within the system. Stated differently, the monitored operational metrics must accurately reflect the behaviors of the system and cannot falsely indicate that the system is behaving improperly when the system is actually behaving as intended. Monitoring services that are too sensitive, static and inflexible, and/or improperly configured, cause such errors. Moreover, monitoring is often a manual process that, particularly for enterprise systems, is prone to error, prone to overlooking of some metrics, suffers from human inability to sometimes process large disparate data sets, and is slow to adapt to changes in topology, configuration, and otherwise.
It is with these concepts in mind, among others, that aspects of the present disclosure were conceived.