Embodiments of the inventive subject matter generally relate to the field of computer systems and more particularly, to temperature management in computer systems.
Oftentimes, individual components of a computer system (e.g., processors, memory, boards, cards, etc.) have thermal sensors affixed or located nearby. The thermal sensors report to a central controller. When a thermal sensor communicates to the central controller that the temperature of a component has increased beyond a threshold, the central controller can take action to decrease the temperature of the component. For example, the central controller can increase fan speed, shut down the component, move workload from the overheating component to another component, etc. Unfortunately, thermal sensors can malfunction. For example, a thermal sensor can indicate that the temperature of a processor has exceeded the threshold, although the processor is currently operating within a safe range. When a thermal sensor malfunctions, the central controller may increase fan speed, shut down the processor, etc. unnecessarily, resulting in undesired operating characteristics such as fan noise, decreased performance, etc. Systems also exist which also include redundant thermal sensors on the components (e.g., two thermal sensors on each component), which increases cost.