As computer systems get larger and larger, as do their applications, the difficulty in monitoring all of the various applications on a system also increases. In particular, some systems may be distributed geographically (for example, in cloud computing), and multiple applications may run on multiple processors within a single computer system.
Further, these computer systems may be dynamically configured, with applications moving between processors as necessary. Additionally, the physical computer system may be dynamically configured with additional processors brought online as needed by the various applications. Monitoring such systems is extremely complex and it is difficult to configure monitoring systems such that they sufficiently monitor all of the various applications, provide a user sufficient and easily understandable alerts, and possibly to automatically repair some application problems.
Overview
In an embodiment, an application performance management system including a communication interface and a processing system is provided. The communication interface is configured to communicate with an agent deployed within a target computing system.
The processing system is coupled with the communication interface, and is configured to receive first metrics associated with a first operational element executing within a first target computing system from a first agent, and to receive second metrics associated with a second operational element executing within a second target computing system from a second agent, wherein the second operational element is in a class of operational elements including the first operational element.
The processing system is also configured to process the first and second metrics to determine a performance relevant pattern related to the class of operational elements including the first and second operational elements, and to modify an algorithm for detecting performance issues within the class of operational elements based on the performance relevant pattern. The processing system is further configured to receive third metrics associated with the first operational element, and to apply the modified algorithm for detecting performance issues to the third metrics to detect performance issues for the first operational element.
In another embodiment, a method of managing operational elements executing within a target computing system is provided. The method includes receiving first metrics associated with a first operational element executing within a first target computing system from a first agent, and receiving second metrics associated with a second operational element executing within a second target computing system from a second agent, wherein the second operational element is in a class of operational elements including the first operational element.
The method also includes processing the first and second metrics to determine a performance relevant pattern related to the class of operational elements including the first and second operational elements, and modifying an algorithm for detecting performance issues within the class of operational elements based on the performance relevant pattern. The method further includes receiving third metrics associated with the first operational element, and applying the modified algorithm for detecting performance issues to the third metrics to detect performance issues for the first operational element.
In a further embodiment, one or more non-transitory computer-readable media having stored thereon program instructions to operate an application performance management system is provided. The program instructions, when executed by processing circuitry, direct the processing circuitry to at least receive first metrics associated with a first operational element executing within a first target computing system from a first agent, and to receive second metrics associated with a second operational element executing within a second target computing system from a second agent, wherein the second operational element is in a class of operational elements including the first operational element.
The program instructions also direct the processing circuitry to at least process the first and second metrics to determine a performance relevant pattern related to the class of operational elements including the first and second operational elements, and to modify an algorithm for detecting performance issues within the class of operational elements based on the performance relevant pattern. The program instructions further direct the processing circuitry to at least receive third metrics associated with the first operational element, and to apply the modified algorithm for detecting performance issues to the third metrics to detect performance issues for the first operational element.
In another embodiment, a method of identifying a status for an operational element, includes collecting a first plurality of metrics associated with a first operational element running on a first host. A second plurality of metrics associated with a second operational element running on a second host is also collected. An expert rule is modified based on the first plurality of metrics and the second plurality of metrics. The modified expert rule is applied to determine a selected status for the first operational element.