Digital processors, such as microprocessors, use a memory subsystem to store data and processor instructions. Some processors communicate directly with memory, and others use a dedicated controller chip, often part of a “chipset,” to communicate with memory.
Conventional memory subsystems are often implemented using memory modules. Referring to FIG. 1, which illustrates an example conventional memory subsystem, a microprocessor 20 communicates with a memory controller/hub (MCH) 30 that couples the microprocessor 20 to various peripherals. One of these peripherals is system memory, shown as dual in-line memory modules (DIMMs) 40, 42, and 44 inserted in card slots 50, 52, and 54. In this example, each of the DIMMs 40, 42, 44 includes a number of memory devices 35, which may be DRAM memory devices. When connected, the DIMMs are addressed from MCH 30 whenever MCH 30 asserts appropriate signals on an Address/Control Bus 60. Data transfers between MCH 30 and one of DIMMs 40, 42, and 44 occur on a Data Bus 70.
Thermal throttling refers generally to methods used to reduce the workload experienced by processor-based electronic system components in response to overheating. For example, some processors are equipped with a pin that signals when the processor die temperature has exceeded a threshold level. When the threshold is exceeded, the processor is “throttled” or operated at a slower speed for a period of time in order to reduce the amount of heat generated by the processor.
Memory modules are another type of component that may be found in processor-based electronic systems that may be thermally throttled.
For example, FIG. 2 is a flowchart illustrating a conventional method 200 of thermal throttling that may be applied to the memory subsystem of FIG. 1. In process 210, the MCH 30 counts the number of read requests R that are directed at any of the DRAMs 35 on the DIMMs 40, 42, 44 during a first time period Δt1. The first time period Δt1 may be referred to as a global sample window (GSW). In process 220, the number of read requests that occur during the GSW is compared to a first preset read threshold n1. If r is less than or equal to n1, process 210 is repeated. If r is greater than n1, a thermal throttling mode is entered at process 230 for a second time period Δt2, where Δt2 is greater than or equal to the first time period Δt1. The second time period Δt2 may be referred to as the Read Throttle Period (RTP).
At process 240, the number of read requests occurring during a third time period Δt3 is tracked by the MCH 30. The third time period At3 may be referred to as the Read Monitor Period (RMP). The length of the second time period Δt2 (RTP) is n times the length of the third time period Δt3 (RMP). In process 250, the number of read requests R is compared to a second preset read threshold n2. If R is greater than n2, process 260 prevents additional read requests from being issued to the memory interface for the rest of the time period Δt3 (RMP). Regardless of the outcome of process 250, in process 270 the number of elapsed third time periods Δt3 (RMPs) is checked for equality with the second time period Δt2 (RTP). If the RTP has not expired, a return to process 240 occurs and the number of reads is checked for another RMP. If the RTP has expired, then the throttling mode also expires and a return to process 210 occurs.
In the above example, write requests that are directed at DRAMs 35 on the DIMMs 40, 42, 44 are handled in an identical manner, but using a separate mechanism. Thus, the thermal throttling mode could be triggered either by the number of read requests or the number of write requests exceeding a threshold level.
In the example described above, all reads and writes are treated identically, and no distinction is made based upon which individual DIMM 40, 42, 44 contains the DRAM 35 that is the target of the memory transaction. This approach works well for desktop systems because it successfully accounts for the total dissipated power (TDP) during read and write cycles for the entire memory subsystem. However, contemporary server systems can dissipate more heat compared to desktop systems and the primary thermal concern is the thermal density for individual memory modules. Also, compared to desktop traffic, server traffic is generally more random and spread across various memory modules as compared to desktop traffic.
Thus, if one assumes that that reads are well-distributed across all the memory modules in a server system (a fairly safe assumption), the result will be a threshold that is set too high. In such situations, the memory system might become vulnerable to damage by a power virus, which is a virus designed to concentrate memory accesses on one DIMM or even on one DRAM. Power viruses such as these have the potential to destroy the particular memory module that is attacked. Even in the absence of a power virus a “hot spot” can occur under some reasonable workloads, or when memory modules of different size are used.
Conversely, if one assumes that all reads or writes will be targeted to one memory module, the threshold will be set too low and the attainable performance of the system will be constrained due to overly frequent and unnecessary throttling of the memory interface.
Furthermore, while the power dissipated by read and write requests is accounted for by the above example, it fails to recognize the power dissipated during activates on the DRAM interface.