1. Field of the Invention
The present invention relates to techniques for balancing thermal variations and/or energy variations across a set of processors or processor cores in a computer system.
2. Related Art
Elevated temperatures pose a variety of challenges during the design and operation of modern computer systems, including challenges associated with: reliability, timing, performance, cooling costs, and/or leakage power. For example, because of the increasing power densities in computer systems, cooling has become increasingly expensive, both for large-scale computer systems and for multiprocessor systems on-chip (MPSoC). Moreover, the associated temperature increases exacerbate reliability issues, because hot spots and thermal cycling can increase the rate of failures during device lifetimes.
In addition to problems associated with high temperatures and temperature cycling, some failure mechanisms are affected by temperature gradients. For example, as feature sizes shrink, spatial-temperature variations can cause: timing failures due to variable delay, issues in clock-tree design, and other performance challenges. In particular, because local resistances scale linearly with temperature, rising temperatures increase these resistances, thereby increasing circuit delays and ohmic losses. Note that global clock networks on chips are especially vulnerable to such spatial temperature variations because they extend all over the chip.
Moreover, in computer systems that include multiple processor or multiple processor cores, operating-system-level schedulers are often used to perform load balancing and to distribute workload evenly across the processors (or processor cores) is periodically distributed as evenly as possible. In general, load balancing increases processor utilization and therefore results in better performance. However, these schedulers often do not take the effects of temperature variations into account when determining workload schedules. Consequently, the resulting schedules often lead to temperature distributions that can exacerbate temperature-induced problems.
Hence, what is needed are techniques for balancing workloads in a computer system without the problems described above.