1. Field of the Invention
The present invention relates to techniques for balancing thermal variations and/or energy variations across a set of processors or processor cores on an integrated circuit.
2. Related Art
Elevated temperatures create a variety of challenges during operation of modern computer systems, including challenges associated with: reliability, availability, serviceability, timing, performance, cooling costs, and/or leakage power. For example, because of the increasing power densities in computer systems, cooling has become increasingly expensive, both for large-scale computer systems and for multiprocessor systems on-chip (MPSoC).
Moreover, the associated temperature increases exacerbate reliability issues, because hot spots and thermal cycling can increase the rate of failure during computer-system lifetimes. For example, spatial and temporal thermal variations can accelerate known degradation modes, including: solder fatigue, interconnect fretting, differential thermal expansion between bonded materials that lead to delamination failures, thermal mismatches between mating surfaces, differences in the coefficients of thermal expansion (CTEs) between packaging materials, wirebond shear and flexure fatigue, microcrack initiation and propagation in ceramic materials, and repeated stress reversals in brackets (which can lead to dislocations, cracks, and eventual mechanical failures).
Moreover, in computer systems that include multiple processor or multiple processor cores, operating-system-level schedulers are often used to perform load balancing and to distribute workload as evenly as possible across the processors (or processor cores). In general, load balancing balances processor utilization and therefore results in better performance. However, these schedulers often do not take the effects of temperature variations into account when determining workload schedules. Consequently, the resulting schedules often lead to temperature distributions that can exacerbate temperature-induced problems.
Additionally, operating-system-level schedulers in many computer systems that include multiple core processors perform so-called ‘first-available’ scheduling. If processors have 100% utilization, first-available scheduling does not affect the thermal distribution in these chips. However, utilization factors for processes in many computer systems are between 10-20%, in which case first-available scheduling can exacerbate spatial and temporal thermal variations. These thermal variations can also be increased in computer systems in which load balancing is based on locality (i.e., which cores share the same memory), because this scheduling technique tends to assign particular jobs or threads to the same group of cores unless these cores are busy or are executing higher priority jobs or threads.
Hence, what is needed are techniques for balancing workloads in a computer system without the problems described above.