1. Field of the Invention
The present invention relates generally to performance monitoring in computer systems.
2. Background Art
Computer systems, for example, computer processors including central processor units (CPU) and graphics processors (GPU), are capable of executing increasing numbers of processing threads in parallel. The parallel execution of numerous threads can yield substantial increases in performance and overall efficiency in the computer system.
Debugging computer applications is complex. The complexity of debugging increases when the application concerned executes in an environment having multiple threads or processes. Multiple simultaneously executing threads can cause processing delays due to numerous issues such as thread synchronization, resource sharing, resource contention etc. For example, designers of a GPU having multiple execution units may expect a particular level of performance based on the number of execution units, but some applications having a large number of parallel threads may yield a much lower level of performance due to thread interaction issues.
Conventionally, most processor and application designers have debugged issues such as thread interaction using instrumented code and/or performance counters. Instrumenting the code, in general, involves inserting additional statements in the code before and/or after selected processing steps. The additional statements usually are directed to steps such as incrementing or decrementing performance counters, or writing debug messages. In general, such additional statements increase the size of the executable code and slows the processing speed due to additional steps and output requirements. Therefore, although instrumenting the code allows for many debugging issues to be resolved, by allowing the behavior of the application to be changed due to additional processing steps, many complex issues involving multiple threads may go undetected.
Performance counters are implemented by instrumenting the code and/or using hardware-based probes to increment and decrement a set of software counters or registers. Performance counters count the occurrences of each of a predetermined set of events. Unlike instrumented code, hardware-based probes can be inserted so as not to impact the general processing flow of the system.
In many computer systems, numerous performance counters are available. For example, performance counters may provide a count of the number of threads executing at a given time, the highest number of threads that were executing in parallel at any point during the execution of an application, and/or the highest level of memory usage during the execution of an application, etc. However, performance counters, even when implemented using hardware-based probes, can provide only a view of system performance that is aggregated over defined time intervals. Performance counters cannot illustrate the interactions between any two threads that happen to be executing simultaneously.
In the case of both instrumented code and performance counters, the user is often left to trial and error to detect application issues while controlling the impact of additional debugging steps on application performance and interactions. For example, at some debugging levels, so many performance counters may be accessed or so many debug statements may be written, that the memory input/output may be increased to a level that impacts the servicing of processing threads.
What is needed therefore is a hardware-based dynamic thread performance monitoring system that that monitors the performance of the system without impacting the actual performance of applications.