Over the past number of years continual improvement of microprocessor performance has been achieved through continued increases in clock rates associated with microprocessors. However, recently the improvement has slowed to a fraction of what has occurred in the past. Modern microprocessor designers are now achieving additional performance by increasing the number of microprocessor cores placed on a single semiconductor die. These multi-core processors enable a plurality of operations to be performed in parallel, thereby increasing instruction throughput, i.e., the total number of instructions executed per unit of time.
A noted disadvantage of multi-core processors is that with the addition of each core, the total power consumed by the processor increases. This results in generation of additional heat that must be dissipated, etc. Modern processors have a power envelope associated therewith based on the physical limitations of heat dissipation, etc. of the physical processors and packaging of the processors. Running a processor over the power envelope may cause physical damage to the processor and/or the cores contained therein.
Certain processors, such as those designed for laptop computers, include active power management to lower the total power consumed, in turn, by lowering the operational frequency of the processors. This may occur when, e.g., a laptop is placed in a standby mode. Processors designed for servers or other non-laptop applications typically have not been concerned about operating on battery power; however, the total power consumed (or heat generated) is now reaching a point where allowing power consumption to increase is no longer feasible due to physical constraints of the processor and/or processor packaging.
Generally, in a multi-core system, running all cores at full speed results in a power consumption of nPmax watts, where n is the number of cores in the processor and Pmax is the maximum power consumed by a single core. However, the processor's power budget is such that only αPmax watts is feasible, where α represents a fraction of total power that may be consumed due to physical limitations of the semiconductor die and/or packaging.
Typically, processor cores operate using a fixed allocation of power consumption among the cores. However, a noted disadvantage of such a fixed allocation technique is that the overall system throughput, as measured by instructions performed per unit time, is suboptimal as will be shown herein. Assume that the frequency of each of the cores may be varied on some multiple of the clock cycle to a spectrum of frequencies (f0, f1, . . . , fmax). The power dissipation of the core is proportional to the square of the chosen frequency. As will be appreciated by one skilled in the art, the selected clock rate for a core during a particular time interval determines the core's instruction rate during that time interval.
Without loss of generality, assume that each core is capable of operating at one billion instructions per second (1 BIPS). Let the vector s={si,0<i≦n,0≦si} be the set of instruction service rates for each core. Furthermore, let the power for these n cores be defined as follows:
                              P          ⁡                      (            s            )                          =                                            ∑                              i                =                1                            n                        ⁢                                                  ⁢                                          c                i                            ⁢                              s                i                2                                              ≤                      α            ⁢                                                  ⁢                          nP              max                                                          (        1        )            where ci is a constant for core i. The constant ci may represent architectural differences for a particular core. For example, one core on a processor may comprise a floating point unit which consumes more power per instruction, than, e.g. a simple arithmetic unit. As such, the power cost ci of that core may vary from other cores of the processor. To simplify modeling, assume that power varies with the square of the frequency and that the frequency determines the maximum instruction rate.
To maintain overall operations within the power envelope of αnPmax a processor designer could evenly distribute the processing capability across all cores. For simplicity, assuming ci=1, then Equation (1) becomes:
      α    ⁢                  ⁢          nP      max        =                    ∑                  i          =          1                n            ⁢                          ⁢                        c          i                ⁢                  s          i          2                      =          ns      i      2      which reduces to:si2=αPmax Thus, if all cores utilize a fixed allocation, then all cores can be allocated a service rate that is si=√{square root over (αPmax)}. To simplify this further for comparison purposes let Pmax=1, so:si=√{square root over (α)}  (2)This indicates that under the fixed allocation scheme when power is reduced by 1−α, the core service rates are reduced by 1−√{square root over (α)}.
Let the vector a={ai,0<i≦n,0≦ai} represent a set of requested instruction annual rates for each core by an applied workload during a given interval. A noted disadvantage is that the requested instruction annual rates may vary considerably and may exceed the fixed service rates during certain time intervals. Thus, the system throughput is suboptimal.