The importance of application performance monitoring has constantly increased over time, as even short and minor performance degradations or application outages can cause substantial losses of revenue for organizations operating those applications. Service oriented application architectures that build complex applications by a network of loosely connected, interacting services provide great flexibility to application developers. In addition, virtualization technologies provide more flexibility, load adaptive assignment of hardware resources to applications. As those techniques increase flexibility and scalability of the applications which enables more agile reaction of application developers and operators to changed requirements, this also increases the complexity of application architectures and application execution environments.
Monitoring systems exist that provide data describing application performance in form of e.g. transaction trace data or service response times or hardware resource utilization in form of e.g. CPU or memory usage of concrete or virtual hardware. Some of those monitoring systems also provide monitoring data describing the resource utilization of virtualization infrastructure like hypervisors that may be used to manage and execute multiple virtual computer systems. Although those monitoring systems provide valuable data allowing to identify undesired or abnormal operating conditions of individual software or hardware entities involved in the execution of an application, they lack the ability to determine the impact that the detected abnormal conditions may have on other components of the application or on the overall performance of the application. Components required to perform the functionality of an application typically depend on each other and an abnormal operating condition of one of those component most likely causes abnormal operating conditions in one or more of the component that directly or indirectly depend on it. Knowing those dependencies on which causes for detected abnormal operating conditions may travel can greatly improve the efficiency of countermeasures to repair those abnormal operating conditions. However, those dependencies are e.g. caused by communicating software services and components or by shared virtualization infrastructure. Documentation describing those dependencies is often not available, or manually analyzing this dependency documentation is too time consuming for the fast decisions required to identify appropriate countermeasures.
Consequently, an integrated system and method is required that identifies and monitors software and hardware components involved in the execution of a monitored application, that detects dependencies between those components and that uses the gathered structural, performance and resource utilization related data to identify abnormal operating conditions of components and to identify causal relationships between different identified abnormal operating conditions. In case multiple, causally depending abnormal operating conditions are detected, the system may further determine one or more detected conditions being the root cause for the other causally depending conditions.
This section provides background information related to the present disclosure which is not necessarily prior art.