Field of the Invention
The present invention relates in general to the field of information handling system thermal management, and more particularly to server information handling system thermal management enhanced by estimated energy states.
Description of the Related Art
As the value and use of information continues to increase, individuals and businesses seek additional ways to process and store information. One option available to users is information handling systems. An information handling system generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, information handling systems may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
Server information handling systems process client requests through network interfaces. A typical server information handling system is built with a number of different processing components that cooperate to execute instructions to process information stored in memory. Server information handling systems are often deployed in large numbers within datacenters that provide power and cooling infrastructure. For example, a data center will often have multiple racks that each support multiple server information handling systems. The rack is placed proximate a source of cooling air, such as an air conditioning vent, so that cooling fans in the server information handling systems and rack force cooling airflows over the processing components to remove excess thermal energy. Without cooling airflow, server information handling systems concentrated in a rack will typically overheat leading to forced shutdowns typically needed to prevent damage to the processing components. In addition, the rack generally includes backed-up power supplies that meter power to each server information handling system in the rack. Typically, the power supplies in the rack cannot meet the power requirements of all of the server information handling systems in the rack operating at full power. Instead, power is allocated between server information handling systems based on system utilization, system priorities and available power.
During normal operations, server information handling systems typically monitor thermal conditions with temperature sensors and adjust cooling fan speeds to maintain desired thermal constraints. If cooling fans cannot provide adequate cooling airflow to maintain desired thermal constraints, then additional steps are generally taken to reduce the creation of thermal energy, such as throttling CPUs. Generally, thermal constraints are maintained by a baseboard management controller (BMC) or other processor that executes firmware and provides out-of-band management control of the server information handling system, such as remote starts and shutdowns commanded through a management network. In addition to ensuring that processing components are not damaged by excess thermal energy, the BMC typically sets thermal constraints to avoid unnecessary power consumption and acoustics associated with cooling fan operation. One difficulty with setting and maintaining thermal constraints is that exact thermal conditions within a server information handling system are sometimes difficult to discern since a large number of power-consuming components are usually included within the server information handling system. Generally, BMC thermal controls incorporate a safety margin that errs towards maintaining thermal conditions below acceptable levels, such as by running cooling fans slightly faster than necessary or throttling CPUs slightly earlier than necessary. The size of this safety margin relates to the amount of uncertainty regarding actual versus sensed thermal conditions, and generally results in greater acoustics and power consumption at the server information handling system.
PCI cards are one example of a power consuming component that does not typically include a thermal sensor. At manufacture, thermal characteristics of installed PCI cards may be tested and included in a look-up table for reference by a BMC to provide a predetermined cooling fan response. However, if an end user replaces the installed PCI card or adds additional PCI cards, the thermal characteristics become unknown so that the user must manually configure cooling fan settings or accept sub-optimal thermal management. Manual configuration of thermal parameters tends to be complex so that end users are likely to simply accept sub-optimal performance, resulting in a reduced end user experience. A multitude of other types of power consuming components are generally included in a server information handling system that do not report sensed temperatures or independently qualify for thermal characterization estimates; yet, the many different components add up to have an impact on thermal performance. Generally, these components are managed by setting an open loop fan speed set by system characterizations from development testing. In addition, even components that include temperature sensors sometimes fail to provide accurate sensed conditions, such as in the event of failure or during boot, when the thermal control typically runs on a “blind” hard-coded fan speed that trades off thermal risk and acoustics.