1. Field of the Invention
The present invention generally relates to heat management in Integrated Circuit (IC) chips and more particularly to reducing power hotspots in a Simultaneous MultiThreaded (SMT) IC chips, such as an SMT processor or microprocessor.
2. Background Description
Semiconductor technology and chip manufacturing advances have resulted in a steady increase of on-chip clock frequencies, the number of transistors on a single chip and the die size itself. Thus, not withstanding the decrease of chip supply voltage, chip power consumption has increased as well. Further, this power consumption is concentrated into increasingly smaller chip areas that may, from time to time, result in local chip hotspots. Also, chip current leakage (i.e., lost/wasted energy) increases exponentially with increasing temperatures. So, these increased peak and average temperatures can waste chip energy and shorten chip and system lifetimes as well. It has been shown that electrical circuit lifetime may be cut in half, when the operating temperature increases by 10-15 degrees Celsius. Consequently, increased on-chip peak temperatures increase chip cooling and packaging cost to maintain operating temperatures at optimum levels. This corresponds to escalating chip and system level cooling and packaging costs.
Dynamic thermal management (DTM) techniques have been employed in the past, as a hardware solution to limit peak temperatures, especially on state of the art microprocessors. DTM techniques throttle back chip performance, to lower power consumption when the chip reaches a preset temperature threshold. A variety of actuating responses are available to effect such throttling, e.g., global clock gating, clock-throttling, voltage and/or frequency scaling. However, these drastic hardware throttling measures can severely degrade performance for a class of very high performance applications.
A scalar processor fetches and issues/executes one instruction at a time. Each such instruction operates on scalar data operands. Each such operand is a single or atomic data value or number. Pipelining is an approach to maximizing processor performance, wherein processor chip logic is bounded by pairs of register stages with multiple pairs forming the pipeline. Logic between each pair of stages may be operating independently on a different operand than logic between other pairs. A series of operands or low level operations forming a higher level operand traversing the pipeline may be in what is known as a thread. A hotspot may occur at a pipeline stage, for example, by vigorously exercising logic in that stage. For example a shift and add multiplier may involve repeated adding, cycle after cycle, for 32, 64 or more cycles, and can cause a hotspot at the adder.
A superscalar processor can fetch, issue and execute multiple instructions in a given machine cycle, each in a different execution path or thread, in what is also referred to as Simultaneous Multi-Threading (SMT). Each instruction fetch, issue and execute path is usually pipelined for further, parallel concurrency. Examples of superscalar processors include the Power/PowerPC processors from IBM Corporation, the Pentium processor family from Intel Corporation, the Ultrasparc processors from Sun Microsystems and the Alpha processor and PA-RISC processors from Hewlett Packard Company (HP). State of the art superscalar microprocessors utilize SMT on multiple cores on a chip (e.g., Chip-level MultiProcessors (CMP)).
These state of the art superscalar microprocessors present new perspectives for thermal management using task scheduling and migration in system-level software such as Operating System (OS) and the virtualization layer (also known as the Hypervisor). Core-hopping, for example, involves migrating a hot task (as determined by a local temperature sensor) between multiple cores, and has proven an effective state of the art mechanism for reducing peak temperatures. However, core-hopping requires the availability of colder, idle cores as hot task destinations. State of the art systems typically are loaded or over-loaded such that idle destination cores are unavailable.
While distributing the power consumption more evenly across a CMP mitigates thermal dissipation without compromising performance, it also reduces static design choices. Evenly distributing the power consumption requires relatively simple cores (and more cores per die/chip), based on thermal-aware floor planning. However, restricting chips to numerous simple cores located for thermal-awareness fails to deal with and exploit workload variability.
Thus, there is a need for improved dynamic redistribution of CMP chip power located to meet power and thermal envelope requirements and without reducing chip performance.