Advances in software and hardware technologies and recent trends towards virtualization and standardization are rapidly adding to the complexity of the execution stack. As a result, performance tuning is turning into an increasingly challenging task for software developers. Complex interactions among execution layers need to be understood in order to properly diagnose and eliminate performance bottlenecks. The necessary foundation for assisting in and for ultimately automating the challenging task of performance tuning is an infrastructure for monitoring performance events across the execution layers of a system.
Performance events occur during normal operation in every execution layer of a computer system. The processing of performance events can result in performance bottlenecks.
A typical approach to detect and understand performance bottlenecks is to monitor the frequency and timing of performance events through a monitoring infrastructure. The monitoring infrastructure may be interactive, allowing dynamic configuration of the monitoring infrastructure. It may also include a Graphical User Interface (GUI) to enable this configuration as well as to process and display the performance monitoring data. The monitoring infrastructure may also provide an application programming interface (API) to enable the programming of tools that generate, consume, and process the monitoring information automatically. The API acts as an interface between the execution layers that emit notifications of events to a monitoring infrastructure and tools that consume and process the event information for analysis. The monitoring infrastructure API may provide specific protocols to tool developers to customize the monitoring activities to the needs of their tool. The execution layers emitting events to the monitoring infrastructure must obey the protocol specified by the API. The tools that use the monitoring infrastructure also must obey the protocol specified by the API in order to be able to consume the event information. There is thus a need for a flexible API that allows tool developers to enable and disable monitoring and specify the amount and type of monitoring information needed during each enabled time interval. Finally, there is also a need for a monitoring API that supports monitoring and processing of event information, both offline and online. Offline processing refers to stand-alone tools that post-analyze an event stream that was generated and gathered during execution and occurs after storing the monitoring data. Online processing refers to tools that process events as they occur, without storing them, for immediate use to drive online bottleneck detection and online performance tuning and optimization tools.
In prior art, performance monitoring infrastructure and its APIs have been focused on monitoring a single computer component or a single execution layer. Examples of performance monitoring APIs specific to an execution layer can be found across the execution layers. For the hardware layer, interfaces such as PAPI have been developed for programming hardware performance counters in a consistent manner across different architectures. See Performance Application Programming Interface (PAPI), http://icl.cs.utk.edu/papi/.
For the operating system layer, customized interfaces such as the interface for the trace facility in operating systems such as IBM's AIX, rtmon in SGI's IRIX, Linux's LTT and oprofile, etc., have been developed. On the Java Virtual Machine layer, JVMPI has been developed by Sun Microsystems as a standard API for monitoring a Java Virtual Machine. Sun Microsystems, Java Virtual Machine Profiler Interface (JVMPI), http://java.sun.com/j2se/1.4.2/docs/guide/jvmpi/.
For enterprise software layers, the ARM (Application Response Measurement) Standard has been developed as a uniform interface to calculate and measure the response time and status of work processed by the enterprise software applications. See the ARM Standard, which can be found in the web at http://www.opengroup.org/tech/management/arm/.
Other examples include an API for monitoring data warehouse activity and the usage of a qualification mask in periodic trace sampling of the application layer. See U.S. Pat. No. 6,363,391, “Application programming interface for monitoring data warehouse activity occurring through a client/server open database connectivity interface”. Assignee: Bull, 2002; and U.S. Pat. No. 6,728,949, “Method and system for periodic trace sampling using a mask to qualify trace data.”
Characteristic of such prior art is the focus on the performance events that are relevant to a single layer in the execution stack. The invention described here distinguishes itself from prior art in that it explicitly targets the integration and interaction across execution layers. Integration across execution layers involves (i) the ability to control and monitor events simultaneously from all layers, in order to correlate events from different execution layers; and (ii) the monitoring of events that result from the interactions among different execution layers.
Sun's DTrace provides a language, called “D” to program specific actions taken at selected instrumentation points. See Bryan M. Cantrill, Michael W. Shapiro, and Adam H. Leventhal, “Dynamic Instrumentation of Production Systems”, Proceedings of the 2004 Annual Technical Conference USENIX'04, 2004. DTrace can analyze an event it receives from any portion of the execution stack, but DTrace is itself not an event based monitoring infrastructure, such as the prior art mentioned above. Rather, it is a basic code instrumentation facility, and the D language provided by DTrace could be used as a foundation to build event monitoring infrastructure.
In summary, the prior art in developing performance monitoring APIs has not yet provided complete integration across all execution layers of the computer system. Integration across execution layers requires the abstraction of basic monitoring functionality such as event counting and event processing to be uniformly applicable to events from all execution layers. This invention is the first approach to develop a rich set of uniform monitoring abstractions across all execution layers of a computer system.