In analyzing and enhancing performance of a data processing system and the applications executing within, it is helpful to know which software modules are using system resources. Effective management and enhancement of data processing systems require knowing how and when various system resources are being used. Performance tools are used to monitor and examine resource consumption as various software applications are executing. For example, a performance tool may identify modules that execute most frequently, allocate the largest amount of memory, or perform the most I/O requests.
In analyzing and enhancing performance of a data processing system, a developer may focus on where time is being spent by the processor in executing software code. Such efforts are commonly known in the computer processing arts as locating “hot spots.” Ideally, one would like to isolate such hot spots at the instruction level in order to focus attention on areas that might benefit most from improvements to the code.
For example, isolating such hot spots to the instruction level permits compiler writers to find significant areas of less than optimal code generation, at which they may focus their efforts to improve code generation efficiency. Another potential use of instruction level detail is to provide guidance to the designer of future systems. Such designers employ profiling tools to find threads, modules, functions, codepaths, characteristic code sequences, or single instructions that require optimization for a given hardware environment.
Multitasking can describe a processor or set of processors that operate on one process or subprocess before another is completed. The term “process” is sometimes used interchangeably with “task,” “thread,” and other such terms. A multitasking system splits time between processes depending on factors such as input/output (I/O) activity, interrupts, or the expiration of a fixed time interval. Threading can be a form of multitasking.
Threading can improve single-application performance by constantly feeding instructions to a single processor. For example, a single-threaded web server would be trapped in a wait state every time it fetched data from a disk. However, a multithreaded web server can handle new requests with one thread while another thread waits on the data from the disk. Multiple threads running on a processor can be analyzed to determine how much time a processor spends on each thread. Such a multithreaded arrangement improves performance by allowing the processor to operate continuously rather than wait for a slow process, such as I/O, to complete.
Process scheduling is the method by which the operating system determines which thread to run on the processor. Threads are sometimes assigned a class depending on the thread's priority. Threads running in a lower-priority class often only receive the processor time left over by higher-priority classes. Schedulers may allocate processor time to threads based on class and may interrupt a thread before the thread is complete. Schedulers may determine the order in which a thread should run and how much processor time each thread is allocated while running.
Sample-based profiling can describe a technique of periodically interrupting the operation of process execution at regular intervals. At each interruption, samples are taken to inform a developer which function was executing just before the interruption. After the interruption, normal processing is restarted. The interrupting and restarting of the process is looped for a predetermined length of time, for a predetermined number of events of interest, or upon an event such as user input.
At each time interval, the processor collects a sample that is then used to determine the function the processor is running. By sampling for many time intervals, a profiler can determine statistically on which functions a processor is spending its time. A profiler can then generate a report summarizing the sampled data.
An example profiler stops an application and samples the program counter of the currently executing thread. The profiler repeatedly stops the processor over many clock cycles to obtain a statistically meaningful quantity of data. The program counter values may be resolved against a load map and symbol table information for determining the function on which the processor is executing. The profiler increments a counter for the area of the particular area of code that is executing. Some profilers process information on the fly and create data structures representing an ongoing history of the runtime environment. Other profilers add data to a buffer or file for processing after sampling.
If profiling was carried out for 100 interrupts, a profile might indicate that the processor was running code from function A during 50 interrupts, the processor was running code from function B during 25 interrupts, and the processor was running code from function C during 25 interrupts. Such data would indicate to the developer that processor time was split among functions A, B, and C on a percentage basis of 50%, 25%, and 25%, respectively. If functions A, B, and C all were written to have equal distribution, the example profile would tend to indicate that functions B and C are not receiving enough processor time and function A is processor-bound, requiring too much processor time.
A sample-based profiler may obtain information from the stack of an interrupted thread. A “stack” is a region of reserved memory in which a program or programs store status data, such as procedure and function call addresses, passed parameters, and local variables. A “stack frame” is a portion of a thread's stack that represents local storage (arguments, return addresses, return values, and local variables) for a single function invocation. Every active thread of execution has a portion of system memory allocated for its stack space. A thread's stack could consist of sequences of stack frames. The set of frames on a thread's stack could represent the state of execution of that thread at any time. Many operating systems provide software timer interrupts useful to profilers. These timer interrupts can be employed to sample information from a call stack.
In a multitasking system, threads can be queued before the threads are executed. One technique for queuing threads is to maintain a single, centralized queue that may be referred to generically as a “run queue.” If a processor becomes available, the next available thread is assigned from the run queue to the processor.
In some multi-processor systems, queuing threads may be accomplished by maintaining separate queues for each processor. Thus, when a thread is created, it could be assigned to a processor in a round robin fashion. With such a technique, some processors may become overloaded while other processors are relatively idle. Furthermore, some low priority threads may become starved, i.e. not provided with enough processing time, because higher priority threads are added to the run queue of the processor for which the low priority threads are waiting.
Previous sample-based profiling systems collected data relating to a specific process the processor was executing during each scheduled interruption of a process. Such profilers provided no data or limited data on a process that was runnable but not running when the interruption occurred. Runnable but not running means that the only resource the process is waiting on is the CPU itself. Such previous profiling systems are limited in the ability to determine whether a process is starved of processor time. Thus, there is a need for an apparatus and method for profiling processes are runnable but not running in a multithreaded environment.