The present disclosure relates generally to computer systems, and more specifically to monitoring performance events in a computer system.
Processing of performance events, which occur during normal operation in every execution layer of a computer system can result in performance bottlenecks and/or delays. As software systems become more advanced, the setting of their level of performance becomes increasingly more challenging for software developers. The interactions among different execution layers should be understood to identify and eliminate performance bottlenecks and delays which may occur. An infrastructure for monitoring performance events across execution layers of a system can be used to identify bottlenecks and delays.
One method of detecting performance bottlenecks and delays is to monitor frequency and timing of the events. Monitoring of the events may be interactive which allows dynamic configuration of a monitoring infrastructure. The method of monitoring may provide an Application Programming Interface (API) to enable a tool to be programmed that can generate and process the monitoring information automatically. The API may function as an interface between different execution layers to indicate the occurrence of events for use by the tool to process the event information for analysis. The monitoring and processing of event information may be both offline and online. With offline processing, a stand alone tool can be used that analyzes an event stream that was generated during execution and after the monitoring data was stored. With online processing, a tool can be used that process events as they occur, without storing them, for immediate use to identify online bottlenecks and delays.
Prior art performance monitoring focused on monitoring a single computer component or a single execution layer. For the hardware layer, interfaces have been developed for programming hardware performance counters across different architectures.
For enterprise software layers, an Application Response Measurement (ARM) standard has been developed as a uniform interface to calculate and measure response time and status of work processed by an enterprise software application.
The prior art computes either “computational wait time” as the time a computational resource R is waiting for another resource, or “resource waiting time” as the time a system resource T has computational resources waiting for it. For example, take the case where there are three program threads (computational resources) each waiting for a lock on a database (a system resource). Current systems would either profile the performance of this system by recording the total time each of the threads spends waiting for the lock—the computational wait time; or they would record the total time spent waiting for the database lock—the resource wait time.
What neither of these approaches measures is the time spent waiting that could have actually been spent computing. Suppose in the above example that for any thread to run, there had to be a free core (e.g., a free processor in a parallel computing machine). Suppose now that at one point, all three threads are waiting for the lock, but only threads 1 and 2 have access to a free core. Since thread 3 does not have access to a core, it could not run even if it were granted the lock. Despite this, the wait times for all three threads are measured identically.
What is needed is a system and method that computes critical waiting time, which is the time a given computational resource is waiting for one and only one other resource; e.g., the time the thread could have been computing had it been granted the given lock, since no other resources—like a core—were needed.
Brief Summary
Wait time for one or more resources is computed, for instance, for detecting delays in a computer system. A method for computing wait time, in one aspect, may include detecting a request for a given resource by one or more resources; and computing, using a processor, a requesting critical wait time of the given resource, the requesting critical wait time being time spent by the one or more resources waiting for the given resource, wherein at least one of the resources waiting for the given resource can proceed if access to the given resource is granted.
A system for computing wait time, in one aspect, may include a monitoring module operable to detect a request for a given resource by one or more resources. The monitoring module further may be operable to determine a requesting critical wait time of the given resource, the requesting critical wait time being time spent by the one or more resources waiting for the given resource, wherein at least one of the resources waiting for the given resource can proceed if access to the given resource is granted.
A computer readable storage medium storing a program of instructions executable by a machine to perform one or more methods described herein also may be provided.
Further features as well as the structure and operation of various embodiments are described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements.