1. Field of the Invention
The present invention generally relates to a method and apparatus for estimation and prediction of a thermal state of an electronic device, and more particularly to a method and apparatus for real-time estimation and prediction of a thermal state (e.g., global temperature distribution) of a microprocessor unit with a limited set of temperature and current measurements.
2. Description of the Related Art
Present computer systems do not have the capability to extract the spatial distribution of temperature (e.g., thermal energy) while accounting for the thermal dynamics of the system.
Typically, one or two thermal sensors are used to trigger entry of a processor into a protection mode. As a result, the allowable temperature on the chip is conservatively held lower than necessary, in order to avoid breaching the junction temperature specifications. This is inefficient and can be problematic.
Additionally, sustained power dissipation over a zone can give rise to “hotspots” on the silicon that could reduce the integrity of a chip. Conservative temperature specification, however, limits the performance of a processor. Thus, there is a tradeoff between keeping the chip “safe” and to optimizing the performance of the chip.
Prior to the present invention, there have been no apparatus and techniques which have addressed extracting the local and global maximum temperature on a chip in real time, or that have facilitated the development of a structured method to manage the temperature on a chip.
Power consumption in a microprocessor (20 mm×20 mm silicon chip) is predicted to grow far beyond 100 Watts during this decade. FIG. 1A shows a typical cooling configuration where the execution of a stream of machine instructions determine the amount of power, Q(x,y,t), dissipated in the X-Y plane (e.g., see FIG. 1B).
The power has a steady (DC) component called “leakage current.” Each clock cycle releases a “quanta of energy packet” distributed in the X-Y plane in the processor circuit layer, thus contributing to an unsteady AC component.
The cumulative effect of AC and DC power dissipation in a processor is a major limiting factor in realizing its full potential performance. The trend toward increased power dissipation is expected to pose an ever greater challenge to the processor and cooling system design. The transient power produces time varying temperature where Tij represents an average temperature for a selected zone (i,j) at any given time, as shown in FIG. 1C. Coordinates xi and yj correspond to the center of a rectangle (i,j). Strictly, Tij is actually Tij(t) for a continuous time system.
A computer system's cooling capability determines the average temperature of a processor system. However, the execution of instructions causes spatially non-uniform and time varying power dissipation, Q(x,y,t), in a processor where “t” denotes time. The corresponding temperature, T(x,y,t), can have local maximums and a global maximum at a given time instant.
A sorting algorithm, for example, may tax the arithmetic unit (AU) of a chip, whereas a solution to a complex fluid dynamic problem may tax the floating point unit (FPU). The resulting transient temperature, T(x,y,t), can fluctuate several degrees relative to the average bulk temperature of the cooling system. The time scale involved can be anywhere from a fraction of a millisecond to a few milliseconds. A processor has several logically separate units, such as the arithmetic unit, a floating point unit, a cache, an instruction decode unit, etc. It is noted that not all units are uniformly activated during a computational operation, and the location where maximum temperature occurs understandably shifts with time.
Theoretically, a large array of temperature sensors distributed over a silicon surface containing active circuit devices could provide a quantitative link to the present temperature of a chip in X-Y dimensions.
However, embedding a multitude of transistor(diode)-based temperature sensors within the digital electronic circuit not only interferes with the digital circuit design, but also impacts cost, performance and reliability of a processor system.
Indeed, one way to measure the temperature of the microprocessor is to use a diode as a temperature sensor. This diode could be external or built into the chip. External temperature diodes are fabricated on semiconductor processes optimized for analog circuits and tend to have better resolution than internal diodes. The current state of the art is ±1 deg C. A built-in diode must compromise with a digital circuit and has much worse specifications.
For example, the Motorola PowerPC® has a temperature sensing diode with ±4 deg. C resolution. It is well known that the forward voltage drop across a diode, Vd, is linearly proportional to the temperature, given by the following equation:Vd=(N*k*T/q)*ln(If/Is)where N=non-linear factor, k=Boltzman's constant, T=absolute temperature, q=electron charge, If=forward current, Is=saturated current. N and Is are process- and device-dependent. Thus, each diode must be calibrated before use. This is problematic and time-consuming.
However, there are several ways to bypass the calibration. One way is to make one diode much larger than the other one (32×) and look at the ratio of the two Vd voltages as suggested by U.S. Pat. No. 5,829,879 to Sanchez, incorporated herein by reference. Another way is to vary the forward current, If, and also look at the ratio of the two voltages to determine the nonlinear factor. Both ways have substantial penalties: much larger chip area (case 1) or multiple current sources (case 2).
A temperature sensing diode gives out about 2 mV/deg C, requires stable current source(s), low-noise amplifiers and high-resolution ADC for proper operation. It would be a major challenge to integrate all of the analog components with noisy high-speed digital circuits to measure temperatures accurately at many different locations.
Another practical consideration is that often one cannot put the diode sensor directly on the hot-spot because of space constraints. Thus, even with the best sensor, some form of spatial extrapolation is still needed to determine the true hot-spot temperature.
Further, bandwidth-limited sensors can provide, at best, a delayed measure (due to its time constant) of the present temperature at a location, and have no ability to predict the temperature characteristics under a given computational load.
Additional propagation delay in the X-Y plane due to thermal capacitance makes the present temperature at an arbitrary location deviate from that of a nearby sensor.
If the present and future temperature of a microprocessor chip can be predicted a few milliseconds ahead using an intelligent methodology, in conjunction with a limited set of sensors (temperature, current, etc.), then new methods to manage processor temperature can be developed. Dynamic thermal management (DTM) techniques (e.g., see D. Brooks and M. Martonosi, “Dynamic Thermal management for High Performance Microprocessors,” IEEE, 2001, 171-182) can be applied through an improved knowledge of the thermal state. Adaptive cooling systems can be configured to optimize the chip's performance.
Thus, “hot spots may move around on a chip depending upon the type of applications. Hence, the use of global ranges or discrete temperature sensors are not optimal. Further, a 2-3 degree conservative prediction may stifle performance. Additionally, there is a problem in placing the discrete sensors in the right spot. Indeed, many applications prevent sensors from being placed in the area of interest.
Yet another problem, prior to the present invention, has been that placing an alien (e.g., separate) temperature sensing circuit into an optimized digital unit (e.g., a processor, a floating point unit or the like optimized for generations to work at maximum speed) may impair the performance of the processor (e.g., especially specialized chips such as game chips).
Hence, one cannot always place sensors where one likes, and thus estimates of temperature may be required. By the same token, it would be useful to know the temperature inside the processor/floating point unit with good accuracy without having the luxury of placing a sensor in the middle of the processor/floating point unit.
Thus, prior to the invention, no such optimal techniques have developed nor have the problems of the conventional techniques and apparatus been recognized. That is, there has been no innovation in temperature management of a chip through real time executable estimation and prediction of the chip temperature.