In many data processing or computer systems, various tasks or applications contend for processing time to execute on one or more processors, also referred to as central processing units (CPU), or similar processing devices. Activity in many highly multi-tasking environments tends to be bursty, having periods of latency or inactivity followed by periods of intense processing activity. Accordingly, it is useful to analyze the utilization of processors and other similar data processing system devices for a variety of reasons. For example, high processor utilization during periods in which few or no user tasks are scheduled may be indicative of a virus program or of some correctable fault in task scheduling resulting in thrashing or other inefficient system behavior.
In theory, processor utilization may be determined by accumulating processor idle time across a sampling interval to determine the percentage of time the processor is inactive. An operating system (OS) may maintain a list of ready-to-run threads or tasks. A thread in the current description refers to a distinct process executed on a processor, which may be a physical processor or a logical processor. When this ready-to-run list is empty, no task is executed and the processor is idle. Accordingly, a processor-independent timer is read and the processor is essentially deactivated. The processor may be put in a predefined processor power state, such as the C2 or C3 states defined by the well known Advanced Configuration and Power Interface Specification, Revision 2.0, Jul. 27, 2000 (ACPI).
In the C2 state, clock signals are removed from the functional units of the processor while the memory subsystem remains active and “snoopable” by other devices. In a C3 state, the clock signal is also removed from the memory subsystems and hence, a so-called “deep sleep” state is entered. When a new task is added to the ready-to-run list, the processor is placed in an active state (such as the C0 ACPI state) and the timer is read again. The difference between the first and second timer reads multiplied by the timer's period represents the idle time of the processor. The accumulation of the idle time across a sampling interval can be used to determine the processor utilization. Unfortunately, this measure of processor utilization is difficult to calculate outside of the OS through a supported application programming interface (API) because the API is generally unaware of the ready-to-run list, which is typically known only to the OS.
An existing solution for the above problem is to use on-die performance counter hardware capable of counting clock ticks for which the processor is not in a low power state. The performance counter thus provides a measure of time the processor spent performing useful work. Software can then periodically sample a register of this performance counter, and calculate the processor utilization based on the following formulae:BusyTicks=Sum(across sampling interval)[CurrentTickCount−InitialTickCount]EffectiveFrequency=BusyTicks/Samplinglnterval(s)Processor utilization(%)=EffectiveFrequency/ActualFrequency*100%
However, the above technique does not apply satisfactorily to a system with simultaneous multi-threading (SMT) technology enabled (hereinafter, a SMT system) or a multi-processor system because of at least two issues, including the OS and the system interrupt mechanism.
Since the OS has multiple physical or logical processors in the multi-processor system or the SMT system, the OS scheduler can be executed on any one of the processors. The scheduler may preempt the thread that is calculating an effective frequency for determining the processor utilization in the middle of the frequency calculation. When the processor resumes execution of the thread, the sampling interval that the thread uses and the processor clock ticks may be out of sync, and thus, resulting in an incorrect frequency.
The second issue of the above technique involves the system interrupt mechanism, such as System Management Mode (SMM). SMM is a shared mode between all processors in a system. During SMM, the states of execution of all the processors are saved and the system enters SMM. When the system exits out of SMM, the state of the processors are restored and the execution resumes from where it was stopped. So, if the frequency calculating thread is executing on one processor and the other processor causes a switch to SMM, the frequency calculating thread is also halted and the system as a whole enters SMM. Upon exit from SMM, the frequency calculating thread is resumed as if nothing has happened. This could lead to an incorrect frequency calculation in determining the processor utilization.