1. Field of the Invention
The present invention relates to information processing systems and, more particularly, to a system and method for efficiently and accurately measuring and monitoring system performance.
2. Description of Related Art
At any given time in a computer system, each central processing unit, or processor, is in one of a number of mutually exclusive states (or is in the process of transitioning from one mutually exclusive state to another). The state of a processor is defined in terms of the nature of the work being performed by the processor. One simple view of processor behavior defines three states: idle, busy, and interrupt.
An operating system controls the allocation of all resources in a symmetrical multi-processor (SMP) system, including the processors. In a thread-based operating system, a component within the operating system, referred to as a scheduler, schedules threads for execution in the processors, while a dispatcher, or dispatching unit, actually dispatches the threads to the processors. A thread is the smallest unit of dispatchable work in a system, and usually consists of one or more lines of code. The code may be kernel code, operating system code, application program code, or any other type of code running in the system. A processor executing a thread from any of these types of code is said to be executing a "user" thread, and the processor is considered to be in a busy state.
If an interrupt occurs, a component of the operating system, referred to as an interrupt handler, assigns a processor to handle the interrupt. The processor executes an interrupt service routine and then returns to the thread that it was executing when the interrupt occurred. If no interrupts are being handled, and no user threads are being executed, a processor is defined to be in an idle state. Typically, there is a specific piece of code, referred to as an idle loop or idle thread, which a processor continuously executes while in an idle state.
For performance reasons, it is often desirable to know how each processor is being utilized. Knowing the percentage of time a processor spends in each state may enable a programmer to enhance overall system performance. For example, while a particular application is running in an SMP environment, one processor may be busy 90% of the time, while another processor is only busy 10% of the time. It may be possible to rewrite the application code to off-load work from the busier processor onto a less busy processor and thus improve overall system performance.
One prior art method for measuring system idle time versus busy time involves the use of a low-priority thread. This thread is assigned a low enough priority so that it only runs when no other threads are running. The amount of time a processor spends executing the low-priority thread is calculated, and is assumed to be equal to system idle time in a system which does not contain the low-priority thread.
However, this prior art method of measuring processor utilization requires the use of a relatively extensive amount of system resources. There is system overhead involved every time a thread is ended and a new thread begins execution. Every time "real work" needs to be done, the low-priority thread must be ended and the new thread must be dispatched. Many operating systems are not very efficient at swapping threads. Thus, the very act of measuring system performance by using a low-priority thread can significantly degrade system performance. This results in an inaccurate view of the resources actually consumed by a given program.
Another prior art method for measuring processor utilization is based on sampling. A thread interrupts the system on a regular basis and interrogates the state of each processor in the system. This can be done, for example, by examining processor queues or thread state variables.
However, there are numerous problems associated with this prior art method as well. This type of sampling tends to miss "small" state changes. Any time spent in a state below a certain threshold will be missed completely. Small state changes can be observed by increasing the sampling rate, but increasing the sampling rate detrimentally affects the very system performance being measured. This prior art approach is also subject to bias. Depending on the mechanism used to interrupt the system, this bias can severely impact the utility of the measures taken. For example, if the mechanism used to interrupt the system is a low priority interrupt, then system states that occur while higher priority interrupts are being serviced will never be observed. The sampling interrupt is deferred until the higher priority interrupt is completed, thus biasing the measure.
Consequently, it would be desirable to have a system and method for efficiently and accurately measuring performance data, including processor utilization, in an information system. It would be desirable to measure performance data in a manner which does not create a misleading measure of system performance, and where the act of measuring performance data does not adversely impact system performance. In addition, it would be desirable to have a system and method to measure system performance which requires minimal changes to the operating system and no changes to any application code.