Software telemetry involves the collection of various metrics that may be useful to analyze performance degradation within a software system. For example, metrics that identify processor usage, memory bandwidth, input/output (I/O) operations, and other resource utilization rates may be indicative of an underlying root cause of performance degradation. In many organizations, administrators are responsible for analyzing various metrics to determine which software or hardware components, if any, are contributing to degradation. However, the root cause of the problem may not always be readily apparent from the raw metric data.
In some cases, an administrator may leverage root cause analytical models to help address performance degradation in complex systems. One example approach is described in U.S. Pat. No. 8,612,377, which is incorporated by reference herein as if set forth in its entirety. According to this approach, an aggregate analytic model is built by linking a plurality of diagnostic models based on the topology of the system. One deficiency of this approach is that the aggregate model is tightly coupled to the underlying topology of the system being analyzed—any changes in the topology of the system would also result in a change in the aggregate model. In addition, the aggregate analytic model relies on an inferred system state to produce diagnostic information. If there are unknown variables and states within a system, then the accuracy of what is being inferred may be unstable.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.