The present application relates generally to thermal management in a power supply. More specifically, the present application is directed to thermal management in a multi-phase power system of a computer system.
In a large-scale computer system, such as a server, a number of electronics cards can be installed in close proximity to each other. For example, a computer system chassis can include racks of cards for processors, memory, communication, input/output interfaces, power management, and the like. When powered, the computer system generates a substantial amount of heat. Cooling features such as heat sinks and cooling fans are typically used to dissipate heat and prevent potentially degraded performance associated with an overheating condition. Locally within a computer system hot-spots can exist where higher current paths are located in close physical proximity and/or effectiveness of cooling flow is reduced. A cooling flow produced by cooling fans can be sufficient under a number of operating scenarios; however, if one of the cooling fans fails, the reduced cooling flow may lead to reduced heat dissipation. Cooling flow effectiveness can also be reduced based on a cooling flow path that draws air over a series of hot spots. A sustained high computational load or operation near peak conditions may result in increased current draw that can lead to increased heat production. Changes in ambient environmental temperature can also alter cooling flow effectiveness.
A server typically has a number of voltage levels that can have varying electric current requirements depending upon loads. Point-of-load (POL) cards can be used to supply current to individual voltage levels, such as feeding multiple voltage levels to one or more processors. Multiple power stage chips (or power phase converters) may be arranged in a serial manner relative to airflow. For instance, a POL card for a processor chip may have three power phase converters near an air inlet to provide a first voltage level to a first load voltage rail followed from an airflow perspective by eleven power phase converters in series to provide a second voltage level to a second load voltage rail, where all fourteen power phase converters are thermally linked to a common heat sink. Similarly, memory chips or memory control chip power levels may also be delivered via a number of power phases that are thermally linked in series regarding airflow for cooling in another POL card.
Power system hardware typically balances electrical current per power phase converter by targeting the same voltage level per rail. This generates similar heat loads in each power phase converter, resulting in cooler power phase converters near the air inlet and warmer power phase converters being cooled by the hotter preheated downstream airflow. The warmest power phase converter typically dictates the power supportable by the POL card. Once one power phase converter's thermal limit is exceeded, it turns off with its current being shed on the smaller number of power phase converters that remain enabled, which may result in a cascading shutdown as higher current and heat are realized in the remaining enabled power phase converters. Thus, a thermal failure on one power phase converter can effectively change an electrically dual-redundant phase design into non-redundant thermal design, if the failure of one power phase converter results in a shutdown of the POL card and server.