Modern data centers house thousands of servers, each having two or more heat-generating microprocessors. Microprocessors can easily produce more than 40 thermal watts per square centimeter, and future microprocessors are expected to produce even higher heat fluxes as semiconductor technology continues to progress. Collectively, the amount of heat generated by all servers in a data center is substantial. Unfortunately, removing this heat from the data center using conventional air conditioning systems is costly and inefficient. Installing air conditioning in a data center requires significant upfront capital expenditures on large computer room air conditioning (CRAC) units, air handling equipment, and related ducting, as well as ongoing operating expenditures to service and maintain the CRAC units. Moreover, CRAC units suffer from poor thermodynamic efficiency, which translates to high monthly utility costs for data center operators. To reduce the cost of operating data centers, and thereby reduce the cost of cloud computing services reliant on data centers, there is a strong need to cool servers within data centers more efficiently.
According to the U.S. Department of Energy, nearly three percent of all electricity used in the United States is devoted to powering data centers and computer facilities. Approximately half of this electricity goes toward power conditioning and cooling. Increasing the efficiency of cooling systems for data centers and computer facilities would lead to dramatic savings in energy nationwide. More efficient cooling systems are also needed in transportation systems due to increasing adoption of hybrid and electric vehicles that rely on complex electrical components, including batteries, inverters, and electric motors, which produce significant amounts of heat that must be effectively dissipated. Cooling systems capable of more efficiently cooling these electrical components would translate to increased range and utility for these vehicles.
Presently, the majority of computers (e.g. servers and personal computers) in residential and commercial settings are cooled using forced air cooling systems in which room air is forced, by one or more fans, over finned heat sinks mounted on microprocessors, power supplies, or other electronic devices. The heat sinks add mass and cost to the computers and place mechanical stress on electronic components to which they are mounted. If a computer is subject to vibration, such as vibration caused by a fan mounted in the computer, a heat sink mounted on top of a microprocessor can oscillate in response to the vibration and can fatigue the electrical connections that attach the microprocessor to the motherboard of the computer.
Another downside of air cooling systems is that cooling fans commonly operate at high speeds and can be quite noisy. When many computers are collocated, such as in a data center or computer room, the collective noise produced by the computer fans can require service personnel to wear hearing protection. As air passes over electronic devices in the computers, the air, which is at a lower temperature than the hot surfaces of the electronic devices, absorbs heat from the electronic devices, thereby cooling the devices. These air cooling systems are inherently limited in terms of performance and efficiency due to the low specific heat of air, which is much lower than the specific heat of water and other coolants. For example, dry air at 20° C. and 1 bar, has a specific heat of about 1,007 J/(kg-K), whereas water at 20° C. has a specific heat of about 4,181 J/(kg-K). Due to air's low specific heat and low density, high flow rates are required to ensure adequate cooling of even relatively small heat loads.
Electronic components within a typical server chassis can produce a thermal load of about 500 watts. The amount of airflow required to cool the components can be calculated with the following equation:
            flow      air        .    =      Q                  c        p            ×      r      ×      Δ      ⁢                          ⁢      T      where fl{dot over (o)}wair is air flow rate, Q is heat transferred, cp is the specific heat of air, r is density of the air, and ΔT is the change in temperature between the air entering the server chassis and air exiting the server chassis. Where the thermal load of the server is 500 W and the maximum allowable ΔT is about 30 degrees, the server chassis will require about 53 cubic feet per minute (cfm) of air flow. For an installation of 20 servers, which is common in computer rooms of small businesses and academic institutions, over 1,000 cfm of air flow is required to cool the servers. Achieving adequate cooling capacity in this scenario requires two air conditioning units sized for a typical U.S. home as well as an appropriately sized air handler and ducting to deliver cool air to the room.
Modern data centers, which can have tens of thousands of servers, must be equipped with many CRAC units designed to cool and circulate large amounts of air. The CRAC units are large and expensive and must be professionally installed and often require substantial modifications to the facility, including installation of structural supports, custom air ducting, custom plumbing, and electrical wiring. After installation, CRAC units require frequent preventative maintenance in an attempt to avoid unplanned downtime. And simply delivering large amounts of cool air to the data center will not ensure adequate cooling of the servers. Special care must be taken to deliver cool air to the servers without the cool air first mixing with warm air exhausting from the servers. This can require installation of special airflow management products, such a raised floors, air curtains, and specially designed server enclosures, to assist with air containment. These products can significantly increase the build-out cost of a data center per square foot. Inevitably, these products do not succeed at isolating cold air from warm air, they simply reduce mixing of hot and cold air and thereby provide marginal efficiency improvements. Therefore, to ensure that sensitive components within the servers do not overheat, most data centers are forced to increase flow rates of cool air well above theoretical values as well as decrease the set point temperature of the room. The result is higher power consumption by the CRAC units and air handlers, leading to higher cooling costs for the data center.
Many electronic devices operate less efficiently as their temperature increases. As one example, a typical microprocessor operates less efficiently as its junction temperature increases. FIG. 64 shows a plot of power consumption in watts versus junction temperature. The bottom curve shows static power consumption of a microprocessor and the top curves show total power consumption for switching speeds of 1.6 GHz and 2.4 GHz, respectively. Total power consumption includes both static power consumption and dynamic power consumption, which varies with switching frequency. As shown in FIG. 64, as the temperature of the microprocessor increases, it consumes more power to provide the same performance. In air cooling systems, it is common for fully utilized microprocessors to operate at or near their maximum rated temperature, resulting in poor operating efficiency. In the example shown in FIG. 64, the microprocessor uses over 35% more power when operating at 95 degrees C. than when operating at 45 degrees C. To conserve energy, it is therefore desirable to provide a cooling system that will allow the microprocessor to operate consistently at lower temperatures. Providing a consistently lower operating temperature for the microprocessor can also extend its useful life and can avoid unnecessary throttling (dynamic frequency scaling) or downtime of the computer due to an unsafe junction temperature.
Operating speeds of next generation microprocessors will continue to increase, as will heat fluxes (defined as heat load per unit area) produced by those next generation microprocessors. Conventional air cooling systems will soon be incapable of effectively and efficiently cooling these next generation microprocessors. Therefore, it is desirable to provide a new cooling system that is significantly more effective and efficient than existing air cooling systems and is capable of managing high heat fluxes that will be produced by next generation microprocessors.
Pumped liquid cooling systems can provide improved thermal performance over conventional air cooling systems. Pumped liquid cooling systems typically include the following items connected by tubing: a heat sink attached to the microprocessor, a liquid-to-air heat exchanger, and a pump that circulates liquid coolant through the system. As the liquid coolant passes through channels in the heat sink, heat from the microprocessor is transferred through the thermally conductive heat sink to the coolant, thereby increasing the temperature of the coolant and transferring heat away from the microprocessor. The heat sink is typically designed to maximize heat transfer by maximizing the surface area of the channels through which the liquid passes. In some examples, the heat sink can be a micro-channel heat sink that utilizes fine fin channels through which the liquid coolant flows. The heated liquid coolant exiting the heat sink is then circulated through a liquid-to-air heat exchanger where the heat is expelled to the surrounding air to the reduce the temperature of the liquid coolant before it circulates back to the pump for another cycle.
Use of closed liquid cooling systems is beginning to migrate from high performance computers to personal computers. Unfortunately, existing liquid cooling systems have performance constraints that will prevent them from effectively cooling next generation microprocessors. This is because liquid cooling systems rely solely on transferring sensible heat by increasing the temperature of a liquid coolant as it passes through a heat sink. The amount of heat that can be transferred is a function of, among other factors, the thermal conductivity of the fluid and the flow rate of the fluid. Dielectric fluids do not have sufficient thermal conductivities to be used in liquid cooling systems. Instead, water or a water-glycol mixture is commonly used due its significantly higher thermal conductivity. Unfortunately, if a leak develops in a liquid cooling system that uses water or a water-glycol mixture, the water will destroy the server and potentially an entire rack of servers. With the price of a single server being thousands of dollars or even tens of thousands of dollars, many data center operators are simply unwilling to accept the risk of loss presented by water-based liquid cooling systems.
While more effective than air cooling, transferring heat by sensible heating requires significant flow rates of liquid coolant, and achieving high flow rates often necessitates high fluid pressures. Consequently, a liquid cooling system designed to cool a modern microprocessor can require a large pump, or a series of small pumps positioned throughout the liquid cooling system, to ensure an adequate liquid coolant pressure and flow rate. Operating large pumps, or a series of small pumps, uses a significant amount of energy and diminishes the efficiency of the cooling system. Moreover, using a series of small pumps increases the probability of the cooling system experiencing a mechanical failure, which translates to unwanted facility downtime.
Although liquid cooling systems have proven adequate at cooling modern microprocessors, they will be unable to adequately cool next generation microprocessors while maintaining practical physical dimensions and specifications. For instance, to cool a next generation microprocessor, liquid cooling systems will require very high flow rates (e.g. of water), which will require large, heavy duty cooling lines (e.g. greater than ¾″ outer diameter), such as reinforced rubber cooling lines or sweated copper tubing, that will be difficult to route in any practical manner into and out of a server housing. If installed in a server, these large plumbing lines will block access to electrical components within the server, thereby frustrating maintenance of the server. These large plumbing lines will also prevent drawers on a server rack from opening and closing as intended, thereby preventing the server from being easily accessed and further frustrating maintenance of the server. As mentioned above, water poses a catastrophic risk to servers, and increasing the pressure and flow rates of water into and out of servers only increases this risk. Consequently, increasing the capabilities of existing liquid cooling systems to meet the cooling requirements of next generation microprocessors is simply not a practical or viable option. Without further innovation in the area of cooling systems, the implementation of next-generation microprocessors will be hampered.
As noted above, liquid cooling systems commonly rely on flowing liquid water through channels in finned heat sinks. The heat sinks are often indirectly coupled to a heat source via a metal base plate that is mounted on the heat source using thermal interface material, such as solder thermal interface material (STIM) or polymer thermal interface material (PTIM), and/or a direct bond adhesive. While this approach can be more effective than air cooling, the intervening materials between the water and the heat source induce significant thermal resistance, which reduces heat transfer rates and the overall efficiency of the cooling system. The intervening materials also add cost and time to manufacturing and installation processes, constitute additional points of failure, and create potential disposal issues. Finally, the intervening materials render the system unable to adapt to local hot spots on a heat source. The net effect of these performance limitations is that the liquid cooling system must be designed to accommodate the maximum anticipated heat load of one or more localized hot spots on the surface of the heat source (e.g. to adequately cool one hot core of a multicore processor), resulting in additional cost and complexity of the entire liquid cooling system.
Unlike water, dielectric coolants can be placed in direct contact with electronic devices and not harm them. Unfortunately, dielectric coolants can have a lower specific heat than water, so they are not well suited for use in single-phase pumped liquid cooling systems. For instance, some dielectric coolants, such as certain hydrofluoroethers have a specific heat of about 1,300 J/(kg-K), whereas water has a specific heat of about 4,181 J/(kg-K). This means that that cooling a microprocessor by sensibly warming a flow of dielectric coolant will require a flow rate about four times higher than a flow rate of water used to cool an identical microprocessor by sensibly warming the flow of water. This higher flow rate requires more pump power, which translates to lower cooling system efficiency.
As an alternative to pumped liquid systems, dielectric coolants can be used in immersion cooling systems. Immersion cooling is an aggressive form of liquid cooling where an entire electronic device (e.g. a server) is submerged in a vat of dielectric coolant (e.g. HFE-7000 or mineral oil). Unfortunately, immersion cooling vats are large, costly, and heavy, especially when filled with dielectric coolant, which can have a density significantly higher than water. Existing vats hold upwards of 250 gallons of coolant and can weigh more than 8,000 pounds when filled with coolant. Typically, a room must be specially engineered to accommodate the immersion cooling vat, and containment systems need to be specially designed and installed in the room as a precaution against vat failure. When using 250 gallons of coolant, the cost of the coolant becomes a significant capital expenditure. Certain coolants, such as mineral oil, can act as solvents and over time can remove certain identifying information from motherboards and from other server components. For instance, product labels (e.g. stickers containing serial numbers and bar codes) and other markings (e.g. screen printed values and model numbers on capacitors and other devices) are prone to dissolve and wash off due to a continuous flow of coolant over all surfaces of the server. As the labels and dyes wash off the servers, the coolant in the vat can become contaminated and may need to be replaced, resulting in an additional expense and downtime. Another downside of immersion cooling is that servers cannot be serviced immediately after being withdrawn from the vat. Typically, the server must be removed from the vat and permitted to drip dry for a period of time (e.g. 24 hours) before a professional can service the server. During this drying period, the server is exposed to contaminants in the air, and the presence of mineral oil on the server may attract and trap contaminants on sensitive circuitry of the server, which is undesirable.
Another cooling approach, known as spray cooling or spray evaporative cooling, relies on atomized sprays. In this approach, atomized liquid coolant is sprayed, through air or vapor, directly onto an electronic device. As a result, small droplets impinge a heated surface of the device and coalesce to form a thin liquid film on the heated surface. Heat is then transferred from the heated surface to the liquid film either by sensible heating of the bulk liquid or by latent heating, as a fraction of the liquid film transitions to vapor. Spray cooling is a very efficient way to remove high heat fluxes from small surfaces. Unfortunately, the margin for error in spray cooling is very narrow, and the onset of dry out and critical heat flux is a constant concern that can have catastrophic consequences. Critical heat flux is a condition where evaporation of coolant from the heated surface forms a vapor layer that prevents atomized liquid from reaching and cooling the surface, often resulting in run-away device temperatures and rapid failure. Great care must be taken to ensure uniform coverage of the spray on the heated surface and adequate drainage of fluid from the heated surface. Although achievable in static laboratory settings, mainstream adoption of spray cooling has been hampered by several factors. First, spray cooling requires a significant working volume to enable atomized sprays to form, which results in non-compact cooling components, making it impractical for packaging in most commercial products. Second, atomizing liquid coolant requires a significant amount of pressure upstream of the atomizer to generate an appropriate pressure drop at the atomizer-air interface to enable atomized sprays to form. Maintaining this amount of pressure within the system consumes a significant amount of pump or compressor energy. Third, high flow rates of atomized sprays are required to prevent dry out or critical heat flux from occurring. In the end, it has proven difficult to design a practical, reliable, and compact spray cooling system, despite a large amount of time and effort that has been expended to do so.
In view of the foregoing discussion, efficient, scalable, high-performing methods and apparatuses are needed for cooling electronic devices that produce high heat fluxes, such as processors and power electronics.