This invention relates to the field of computer systems. More particularly, a system and methods are provided for monitoring and measuring resource usage bottlenecks while traversing or executing code segments.
Performance monitoring and analysis tools are indispensable for promoting the design of efficient software programs. These tools are particularly useful when designing programs for parallel computing systems, as programming for these systems is far more difficult than for sequential systems. A parallel program contains multiple threads of execution that interact with each other to cooperatively accomplish a common goal. Managing parallelism, communication and synchronization among threads is extremely complex and inevitably error prone.
Data collection is a critical problem when measuring the performance of a parallel program. In particular, to measure the performance of a parallel program, it is necessary to collect data for full-sized data sets running on a large number of processors. However, collecting large amounts of data can significantly slow a program's execution and distort the collected data.
A variety of different approaches have been tried to enhance the efficiency with which performance data are collected. Two common approaches are event tracing and statistical sampling. However, both of these techniques have limitations—either in the volume of data they gather or the granularity of data collected. Event tracing, for example, can collect detailed information about interesting events during a program's execution. However, the events generate vast amounts of data that are difficult to manage. Statistical sampling greatly reduces the volume of performance data collected by summarizing interesting information as counts and times that are reported at the end of program execution. Summary data, however, loses important temporal information about usage patterns and relationships between different components.
Dynamic instrumentation is a new approach to data collection that overcomes the limits of tracing and sampling by allowing dynamic insertion and alteration of instrumentation code during program execution. The instrumentation allows measurement of just the performance data that are of interest to a particular user. By targeting just the necessary or requested data, dynamic instrumentation can greatly reduce the amount of information collected without losing the details available with event tracing, and thereby allow users to manage large, long running applications on large-scale parallel computers.
However, current implementations of dynamic instrumentation are limited in their ability to identify all bottlenecks that may occur within large, complex programs. For example, the standard method is to separate a program into a number of segments, and then insert instrumentation code at the beginning and end of each segment. This method cannot indicate whether a particular segment is a bottleneck or indicate the amount of resources used by that segment in comparison to other resource consumers.
Therefore, what is needed is a system and method for measuring performance or resource usage of segments of a computer program in order to identify choke points or bottlenecks in the program, wherein later measurements may be fine-tuned based on earlier measurements.