The present invention relates to a method, computing device and computer program for controlling the rate at which cooling fluid is drawn into a computer housing. In particular, embodiments have application in computing devices housed in enclosures such as data centre halls.
The equipment in data centres, such as computer servers and storage devices, generates a lot of heat while it operates. In most data centres this heat is removed by moving air through the data centre, see FIG. 1, which provides an overview of airflow in a data centre. Cool air is introduced into the data centre hall by cooling equipment 1 and it is forced to flow towards the IT equipment by a set of fans 2 external to the IT equipment. The cool air is then introduced into one end of the IT equipment 3 and drawn through the IT equipment 3 by smaller fans 4 inside the equipment. The air passes over the hot components inside the IT equipment 3 and hot air leaves the IT equipment 3. The hot air circulates back to the central cooling equipment 1 so removing the heat from the IT equipment 3. This process must be carefully managed to ensure that the right amount of cold air arrives at the inlets of all the IT equipment 3: too little air and the equipment overheats, too much air results in wasted energy cooling and moving the air. Of course, alternative systems exist in which the cooling fluid is liquid which is piped through IT equipment and flow is controlled by a series of valves. Similar considerations apply.
Data centre halls may be large, enclosed spaces, and so the overall air-flow pattern is the result of the interactions of all the air moving equipment, the fans internal to the IT equipment 4 and the external cooling fans 2. It is typical in data centre management to allow the IT equipment 3 to operate their fans 4 with a large degree of autonomy. IT equipment fans 4 will respond to changes in the conditions in their local piece of equipment only, increasing airflow rate if local heating is detected and decreasing airflow rate once the optimal temperature is reached. The central cooling fans 2 will operate according to other policies, sometimes they have static speed, in other installations their speed will change according to the sensed temperature of the return (hot) air.
Cooling fluid is a limited resource shared among computing devices. There is a limited supply for a collection of servers, and those furthest away from the source of cooling rely on those nearer to allow some cooling fluid to pass. On the other hand, it is inefficient to oversupply cooling (and in that way guarantee that all the servers can have access to cooling) as in most cases the excess will pass through the data centre unused.
Effective airflow patterns can be established, usually by the manual intervention of data centre managers. The managers may control the gross characteristics of the flow of air, for example by opening or closing vents and perforations, by creating barriers and by setting the speed of the fans on the central cooling equipment. The fans in the IT equipment are not usually under the control of the managers. The airflow pattern in the data centre is carefully monitored and adjustments made until a good airflow regime is created. The established airflow pattern remains static if the heat generated by the IT equipment is constant, for example, in data centres in which the amount of work done by servers changes slowly and the amount of heat generated is insensitive to load.
FIG. 2 illustrates in very simple terms an effective airflow pattern. The arrows represent the flow of cold air. The IT equipment 3 is, for example, a rack of servers. The cold air flows into the aisle at the side of the inlets to the IT equipment 3. The equal number of arrows pointing into each server in the rack indicates an equal share of the cooling fluid from central cooling being drawn into each server. Based on the assumption of an even workload across the servers, this is a desirable airflow pattern.
There is a trend among data centres to become more dynamic in terms of load distribution. The loads on the IT equipment are becoming more dynamic and the IT equipment is now designed to use energy efficiently, which means that the amount of heat generated is much more variable with load.
As a result the airflow patterns are much more dynamic and the manual control policies described above become much less effective. FIG. 3 illustrates how a load imbalance causes an imbalance in supply of cooling fluid to IT equipment 3. Servers with extra load will heat up and draw in more air (represented by more arrows pointing into the server in the middle of the rack in FIG. 3), potentially starving neighbouring servers of cooling air (represented by fewer arrows pointing into the servers at the top of the rack in FIG. 3). This results in reduced cooling and in increased energy use as the fans speeds increase. The central cooling equipment must perform more work to ensure that a minimum level of cooling is supplied to all equipment at the cost of overcooling the best-cooled equipment. The IT equipment fans will also increase their energy usage as the poorly cooled equipment works harder to create a sufficiently large airflow.
Existing systems aim to solve such problems by providing a centralised control mechanism that optimizes the cooling of the complete system, removing the independence of the IT equipment cooling from the actions of the centralized cooling equipment. Such a control system may comprise sensors distributed throughout the data centre to measure the values of quantities that affect performance or reliability (such as temperatures and pressures). These values are reported to a centralized system through some communications network, usually dedicated to the control system. The central system processes the reported values and decides on actions to achieve the required airflow characteristics and sends commands to set the equipment to the required values.
Many data centers are collections of heterogeneous equipment, which is of various ages and comes from a variety of suppliers. Such equipment may not be configurable to integrate with a central control system as it may not support the chosen protocols or communications system.
Each new piece of equipment installed in the data centre must be integrated into the control system; and permitted onto the communications system. For example, capabilities may need to be entered into a database, and possibly also its actual location measured and recorded. Any equipment moves may also need to be carefully recorded.
The control system itself may be complex, requiring the execution of computationally expensive simulations and algorithms: needing to manage and predict the interactions between large numbers of very different equipment.