1. Field of the Invention
The present invention generally relates to power and power-density management in microprocessors, and more particularly to a method and apparatus for a power and thermal management employing software and hardware components.
2. Description of the Related Art
In the last decade, the microprocessor power dissipation has gone up by a factor of ten. The frequency of operation of CMOS devices has increased ten fold. While the input voltage and capacitance of devices has decreased, the number of devices on a typical microprocessor die has increased by an order of magnitude. Moreover, device miniaturization has led to integration of cache contained at a multi-chip level to one contained on the microprocessor die. This has resulted in high CPU core power density—e.g., 50% of a 20 mm by 20 mm micro-processor die may contain the CPU core, with the rest being cache. The total power dissipation from such a microprocessor has reached 100 W, and the power density is estimated to be 40 W/cm2. Extrapolating the changes in microprocessor organization and device miniaturization, one can project future power dissipation density of 200 W/cm2!
Already, power dissipation presents a major limitation on the design of high-performance microprocessors. While power dissipation in processors such as the IBM POWER4™ it is still comfortably within the limits of the packaging/cooling solution of choice for a high-end microprocessor systems, such solutions are undesirable in lower end systems, such as most personal computers, game consoles, set top boxes and similar devices, due to:
Total packaging/cooling solution cost, due to increased package cost, the need for fans, heat spreaders, thermal interface materials etc.
The possible need for a fan which induces noise and can lead to reduced meantime between failure,
The inability to guarantee a controlled environment for the system as is usually provided for high-end systems, including possible obstruction of fan vents, etc.
Thus, power removal, i.e., thermal management of the processor, is an increasingly challenging aspect of packaging as the average power density of processors is expected to increase. The problem will be exacerbated by the need to manage local power densities on die. The development of cost-effective and technically viable thermal management solutions that maintain die temperature at acceptable levels will be key to ensuring future success. This can be accomplished through development and deployment of effective spreader solutions and thermal interface materials. Controlled assembly processes to manage the thermal interfaces are also a key to successful design. Finally, understanding and managing the die power, power distribution, and the thermal environment in the chassis are important.
To date, most packaging solutions have been designed for extended periods of operation in the thermal worst case. However, during normal operation, even high performance systems do not reach these conditions, let alone sustain them for extended periods of time. Thus, thermal solutions found in most systems are unnecessarily conservative and expensive for the most common operating conditions. However, such a design point does prevent catastrophic failure in the event of unexpected workload conditions.
Current microprocessors have only limited ways of responding to overheating (if any at all). State of the art systems like the Pentium 4 use dynamic clock throttling to reduce the danger of catastrophic failure due to power dissipation. However, this degrades applications in unplanned for ways. In many instances, an application could react better to reduce power if it were aware of the situation (e.g, by reducing detail in a game) instead of slowing down overall performance such as to be unresponsive.
FIG. 1 shows application power dissipation estimates for the Intel Pentium 4 processor. In this figure based on “Pentium 4 Processor Thermal Guidelines”, Intel has estimated the power dissipation of a number of popular software applications. This method was based on extracting sequences from the programs and calculating the power consumed if that program were to be run on a Pentium 4 processor. Code sequences, or traces, were gathered from roughly 200 applications and benchmarks.
The packaging solution used for the Intel Pentium 4 provides an alternative approach to thermal management—the thermal design point for the packaging and cooling solution is at 75% of maximum power, which represents the range of power dissipation observed during system simulation for a variety of traces.
To prevent catastrophic failure, the system includes a Thermal Monitor feature that may be used in a variety of ways, depending upon the system design requirements and capabilities. At a minimum, the thermal control circuit supplies an added level of safety against loss in processor availability due to an over temperature situation.
Intel's thermal management in Pentium 4 relies on a mechanism referred to as “STOPCLOCK”, wherein the clock is temporarily halted to reduce power dissipation to within the range supported by the packaging/cooling solution. There is additional support for raising software interrupts and access to device registers indicating when a thermal spike is encountered. FIG. 2 illustrates the operation of a simple hardware-only solution to thermal control based on the STOPCLOCK mechanism according to prior art.
While this solution represents an adequate response to prevent catastrophic failure, its impact on system performance can be quite undesirable. In particular, increased system temperature can lead to the random reduction of system performance, and wreak havoc on the system, leading to erratic behavior.
Thus, what is needed is an appropriate integration of software into thermal management, such that system behavior is a tool of thermal management and can aid in managing system power dissipation. This is particularly important when component cost is a concern, since it allows cheaper packages and other components (such as cooling systems) to be used while preserving acceptable user-experience (i.e., graceful situation-adapted degradation instead of brute force performance reduction)