1. Field of Invention
The present application is directed to the computing sciences generally, and, more specifically, to an instruction that specifies an application thread performance state.
2. Background
As the performance of processors has increased over the years, so too has their power consumption. Whether being “green energy” conscious, attempting to minimize battery draw, or simply trying to minimize their utility bills, customers of computing systems are increasingly focused on the power management capabilities of their internal processor(s). As such, most modern processors have built in power management circuitry. The built in power management circuitry typically is designed to make fine grained power management adjustments dynamically in hardware, and/or, support coarse grained power management adjustments/directives from software.
Generally, the power consumption of an electronic circuit can be controlled by any of three primary techniques: 1) frequency scaling; 2) clock gating; and, 3) voltage scaling. Each of these techniques take into account fundamental principles of the power consumption of an electronic circuit. Mainly, the faster an electronic circuit operates the greater its performance and power consumption will be. A review of each of these techniques is provided immediately below.
Frequency scaling is the adjustment of an operating frequency of a block of logic circuitry. Here, when higher performance is desired of the logic block at the expense of increasing its power consumption, the frequency of operation of the logic block is raised. Likewise, when lower power consumption is desired at the expense of lower performance the frequency of operation of the logic block is lowered. The size and function of the different logic blocks that are frequently scaled can vary depending on the designer's desired granularity.
Clock gating can be viewed as an extreme form of frequency scaling. In the case of clock gating, a clock signal to a block of logic is extinguished so as to reduce both the performance and power consumption of the block to zero. When the block of logic is to be used, the clock signal reemerges to bring the block of logic back to life. Clock gating therefore has the effect of enabling/disabling a block of logic.
Voltage scaling is like frequency scaling except that a power supply voltage is lowered/raised in order to lower/reduce a logic block's performance and power consumption. Notably, the higher the power supply voltage received by an electronic circuit, the higher the maximum clock frequency that can be applied to the logic block.
Processor cores have heretofore been designed with hardware control logic circuitry that can quickly and dynamically adjust frequency scaling, clock gating and/or voltage scaling settings to small, medium and/or large logic blocks of a processor chip in response to detected usage of the processor chip. For example, a floating point execution unit in a pipeline might be disabled/enabled via clock gating depending on whether there are any floating point instructions in an instruction queue. Here, because hardware control can quickly adjust the power management setting of small or medium sized logic blocks, hardware power management control is understood to be capable of “fine grained” control.
That having been said, frequency and voltage scaling are understood to have undesirable latencies associated with their respective state transitions. That is, even if hardware frequency and/or voltage scaling control logic can quickly make a decision that a frequency and/or voltage needs to be changed, implementing the change itself wastes time because, generally, frequency of operation and/or supply voltage of an electronic logic circuit cannot be changed quickly on operational logic without risk of data corruption. Clock gating of medium to large size logic blocks also tend to have similar undesirable latencies when switching between enabled/disabled states. For instance, if an entire processor core is disabled, it generally cannot be brought back to life on a “next” clock cycle.
In this respect it is worthwhile to note that hardware power management control is reactive in that it can only react to the usage of the processor that it observes. The reactive nature of hardware control leads to performance hits or workload imbalances, owing at least partially to the latencies between power management state changes discussed just above, when observed usage of the processor dramatically changes. For instance, if a large multi-core processor is sitting in a low power state with multiple cores having been disabled because of low utilization, and suddenly the processor is presented with a large number of threads for execution, many of the newly presented threads have to undesirably “wait” for cores to be enabled before they can execute.
Some run-time compilers (e.g., OpenMP and MPI) and operating system (OS) schedulers can, however, provide hints to the hardware of upcoming processor usage. With such hints the hardware can prepare itself in advance for upcoming usage changes and, in so doing, ideally avoid performance hits or workload imbalances by beginning to change performance state before the change in usage is actually presented.
With respect to software power management control, software power management control is understood to be mostly if not entirely “coarse grained”. That is, software power management control typically affects medium to large sized logic blocks and, moreover, software controlled power management settings are not rapidly changed (rather they persist for extended periods of time). As such, to the extent the processor hardware provides software writable control fields or other “hooks” to assist software power management, such fields/hooks do not directly implement any fine grained control.
In addition, many existing software power management control mechanisms rely on or effectively merely oversee a specific hardware implementation of power management, such as the P-states of specific processors. As such, true software power management techniques are less portable across different hardware implementations.
The best known possible exception to this perspective is a PAUSE instruction which is an instruction that causes the thread that executes the instruction to be put into a sleep state.