System level thermal management is a primary requirement for all server and storage node chassis. Semiconductor components in a system are typically built to operate under certain thermal conditions. Systems may integrate thermal management as part of their core functionality to protect all components in a system from overheating.
Servers in standard form factor designs (e.g., 2 U rack servers) are usually equipped with fans to circulate cooler air over the components in servers. In addition, some components that dissipate more power, such as Central Processing Units (CPUs) and Input/Output (I/O) cards, also use active (with a localized fan) or passive heatsinks to maintain operating thermal environments within the respective component specification.
In modern server designs, system level thermal management is primarily driven by the workloads on CPUs. When the utilization of CPUs goes up in a server, so do the speeds of fans in the systems (up to configured Revolutions Per Minute (RPM) levels to meet noise levels of data centers). Such systems may include a feedback loop between CPU power dissipation levels and fans providing thermal management in a server, to balance the competing objectives of keeping the system components cool and keeping noise levels at a minimum.
Solid State Drive (SSD) devices in standard drive form factors such as M.2, 2.5-inch, 3.5-inch, and Peripheral Component Interconnect Express (PCIe) adapter card form factors used in servers rely on the system level thermal management. But SSDs are not part of the feedback loop to manage fan speed settings. With conventional SSDs, where SSD workload is correlated to CPU utilization, CPU driven thermal management might be sufficient. But newer classes of SSDs may include In-Storage Compute (ISC)-capable SSDs and Network Attach Storage (NAS) SSDs, to name two examples. In such newer classes of SSDs, SSD power utilization may not be in lock-step correlation with CPU utilization in the servers.
NAND flash media is sensitive to operating thermal environment. If the ambient temperature exceeds operating specifications, the NAND flash media may experience increased bit error rates and thus a potential data loss situation and/or decreased performance levels. So when SSDs operate at higher utilization levels and associated server CPUs are operating at lower utilization levels that do not warrant increased fan speed, this combination could lead to data integrity issues or lower performance levels in a system.
A need remains for a way to maintain proper operating temperatures for SSDs and other storage devices.