Storage devices (e.g., hard disk drives, solid state storage drives) are highly susceptible to failure. Factors such as prolonged exposure to high temperatures, mechanical failures and perturbations, and aging can all contribute to failures in the field. Customers managing large data centers having thousands of storage devices experience high ambient temperatures due to the conglomeration of devices in one space. Higher ambient temperatures in a computing environment introduce stress to disk operations, especially during peak load times. This added stress due to higher temperatures impacts the disk operations.
Many hard disk drive and solid state drive vendors provide the capability to monitor the disk's current temperature, as well as other important data such as occurrence of failure, etc. Such data is provided through a telemetry mechanism known as SMART (Self-Monitoring Analysis and Reporting Technology). SMART data gathered in the disk drive can be accessed by operating systems device drivers issuing SMART commands to the disk drives.
Systems also include a baseboard management controller (BMC), which is typically implemented as a chip on the motherboard. that monitors the temperature in the enclosure resulting from central processing unit (CPU) operations. The BMC may control the operation of fans to reduce the temperature, especially for the CPU. The BMC operates independently of the disk drives and the information they gather, including temperature and other attributes of operations.