FIG. 1 shows the architecture of a standard multi-core processor design 100. As observed in FIG. 1, the processor includes: 1) multiple processing cores 101_1 to 101_N; 2) an interconnection network 102; 3) a last level caching system 103; 4) a memory controller 104 and an I/O hub 105. Each of the processing cores contains one or more instruction execution pipelines for executing program code instructions. The interconnect network 102 serves to interconnect each of the cores 101_1 to 101_N to each other as well as the other components 103, 104, 105. The last level caching system 103 serves as a last layer of cache in the processor 100 before instructions and/or data are evicted to system memory 108. The memory controller 104 reads/writes data and instructions from/to system memory 108. The I/O hub 105 manages communication between the processor and “I/O” devices (e.g., non volatile storage devices and/or network interfaces). Port 106 stems from the interconnection network 102 to link multiple processors so that systems having more than N cores can be realized. Graphics processor 107 performs graphics computations. Other functional blocks of significance (phase locked loop (PLL) circuitry, power management circuitry, etc.) are not depicted in FIG. 1 for convenience.
As the power consumption of computing systems has become a matter of concern, most present day systems include sophisticated power management functions. A common framework is to define both “performance” states and “power” states. A processor's performance is its ability to do work over a set time period. The higher a processor's performance state the more work it can do over the set time period. A processor's performance can be adjusted during runtime by changing its internal clock speeds and voltage levels. As such, a processor's power consumption increases as its performance increases.
A processor's different performance states correspond to different clock settings and internal voltage settings, resulting in different performance vs. power consumption tradeoffs. According to the Advanced Configuration and Power Interface (ACPI) standard the different performance states are labeled with different “P numbers”: P0, P1, P2 . . . P_N, where, P0 represents the highest performance and power consumption state and P_N represents the lowest level of power consumption at which a processor is able to perform work . The P1 performance state is the maximum guaranteed performance operating state. The P0 state, also called Turbo state, is any operating point greater than P1. The term “R” in “P_R” represents the fact that different processors may be configured to have different numbers of performance states.
In contrast to performance states, power states are largely directed to defining different “sleep modes” of a processor. According to the ACPI standard, the C0 state is the only power state at which the processor can do work. As such, for the processor to enter any of the performance states (P0 through P_N), the processor must be in the C0 power state. When no work is to be done and the processor is to be put to sleep, the processor can be put into any of a number of different power states C1, C2 . . . CM where each power state represents a different level of sleep and, correspondingly different amount of power savings and a different amount of time needed to transition back to the operable C0 power state.
A deeper level of sleep corresponds to slower internal clock frequencies and/or lower internal supply voltages andpossibly some blocks of logic, being powered off. Increasing C number corresponds to a deeper level of sleep and a correspondingly higher latency to exit and return to the awake or C0 state. Computing systems designed with processors offered by Intel Corp. of Santa Clara, Calif. include a “package level” P0 state referred to as “Turbo Boost” in which the processor as a whole will operate at clock frequencies higher than its rated maximum guaranteed frequency for a limited period of time to achieve greater performance. Here, “package level” means at least one processor chip and possibly other chips such as one or more other processor chips and one or more other system memory (e.g., DRAM) chips. Thus, at a minimum, according to the present state of the Turbo Boost technology, when the package level P0 state is entered, all the cores 101_1 through 101_N of a processor 100 will receive a clock frequency that is higher than the processor's rated maximum guaranteed clock frequency for a limited period of time.
Operating in Turbo mode comes at the price of increased power consumption.Given the power consumption ramifications, entry into Turbo mode is controlled, chiefly determined by the overall workload demand on the processor as a whole exceeding some configurable threshold. That is, entry into the Turbo Boost mode is a function of package level workload demand.
A problem with the present Turbo Boost technology, having only a package level perspective, is its responsiveness. For example, an operating system (OS) instance operating on one core may request entry into the package P0 state. However, the requested P0 state will not be entered unless and until the workload across the processor as a whole, which includes the workload measured across all the processor's cores 101_1 to 101_N, crosses a pre-established threshold.