1. Technical Field
The present invention relates generally to computer devices and in particular to operating parameters of memory subsystems. Still more particularly, the present invention relates to a method and system that provides direct management of power usage by memory modules within a memory subsystem.
2. Description of Related Art
Increases in processor performance and the proliferation/development of multi-core, multi-threaded processor dies have led to a rapidly increasing need for more memory bandwidth and capacity. While processor development is on a high performance growth path, developments in memory components (e.g., dynamic inline memory modules (DIMMs)) are growing at a much slower rate with respect to both density and performance of their dynamic random access memory (DRAMs).
To keep up with the increasing demands for required data bandwidth and capacity, memory subsystems have had to increase both their frequency of operation and the density of the DIMMs. In addition, due to physical constraints of the overall computer systems, the volume of space that is allocated to the DRAMs on the memory modules has not increased much over time. The combination of increasing need and constant (if not shrinking) space has resulted in packing more DRAMs on each DIMM and more DIMMS into the same physical space. The resulting increased density results in an increase of the power dissipation per DIMM, while the space limitations have reduce the airflow and cooling capacity at the chip level.
In a conventional memory subsystem, power is consumed by a number of components. The main sources of power consumption are the DRAMs and the control chips on the DIMMs. To a lesser degree, power is consumed by discreet devices that are used to terminate the electrical signaling between the memory controller and the DRAMS. However, there is little that can be done to reduce this power consumption.
A DRAM consumes different amounts of power depending on the current state of the DRAM's logic. In general, there are three distinct power states that are relevant. The lowest power state is a “power down” state, at which a typical DRAM may consume less them 10 mwatts of power. The next power state is the “standby” state, at which the typical power consumed is close to 70 mwatts. Finally, the highest power state is the “operating” state, at which power consumption may range between 100 and 700 mwatts.
Within a given memory subsystem, the individual DRAMs may be in any one of the three power states at any time based primarily on the memory access patterns and the bandwidth requirements of the system. While these power consumption values may appear small, a large server system may have tens of thousands of DRAM chips installed. With potentially thousands of DRAMs packed into a relatively small space and each consuming some amounts of power, the possibility of the memory chips overheating becomes a major design concern.
Power management is provided by most conventional systems with system-level temperature control via feedback cooling systems and/or system-level voltage/current control. Cooling systems are designed to attempt to offset/reduce the overheating of the memory subsystem as a whole. Designing cooling systems to provide sufficient cooling capacity for these high density memory systems is extremely difficult as the cooling systems have to keep up with the increasing density of these memory chips. Occasionally, during operation, several DRAMs in close vicinity to each other may be continually in the operating state (as operations targeting the DRAMs are issued), leading to the creation of a hot spot (i.e., localized overheating) within the memory subsystem.
FIG. 1 illustrates a block diagram representation of one prior art power management method having both system level temperature control and voltage/current controls. The temperature control utilizes a feed back control loop and system fans. The computer system includes a processor 111 connected to memory devices 106 via a memory controller 101. Memory devices generally refer to DIMMs on which DRAMs or SRAMS, etc., are built. Located in vicinity of the memory devices is a temperature (voltage) sensor 108 that records and transmits the current temperature (voltage) surrounding the memory devices 106. Also located near to the memory devices 106 is a fan 112, which receives variable current from a fan speed controller 114 and turns on at a corresponding speed to cool the memory devices when the temperature goes above a preset level.
Two control loops are established, with the first being the temperature sensor 108 coupled to the fan speed controller 114, which is in turn coupled to the fan 112. The second control loop includes the processor 111, which includes a memory access command throttling function 113 that responds to the power usage of the memory subsystem reaching or surpassing a preset power usage threshold value by throttling the amount of memory access commands that are sent to the memory controller 101.
While the above described power management/control techniques provide general cooling and maintenance of maximum power usage for the memory subsystem, there is currently no way of predicting where hot spots will occur within the different memory modules. Thus, the above described power management mechanisms are not always able to adequately provide limits on power usage or sufficient cooling directed at these potential hot spots, rather then general power management directed at the entire memory subsystem.
Since these hot spots could eventually lead to burn-out or failure of the DRAMs and/or DIMMs, the present invention recognizes that it would be desirable to be able to provide some directed power management that could more effectively prevent the occurrence of localized hot spots caused by operation of specific DRAMS or specific DIMMs. The invention further addresses the limitations of reliance on system-level power management and response techniques by providing techniques for managing power at the DIMM and DRAM levels. Also, the invention enables directed responses to localized dissipation of heat that targets the particular DIMM or DRAM at which the problem is occurring.