Data centers are cyber-physical systems. Energy management depends upon management of both computational (cyber) resources and cooling (physical) resources. Although these two types of resources are connected through the generation of thermal energy, they are normally controlled independently. For example, workloads are distributed among servers to meet performance objectives under the assumption that the cooling system will remove thermal energy as required. The cooling system responds to the thermal load generated by the servers through thermostatic control.
Data center power consumption has drastically increased in the past few years. According to a report of the Environmental Protection Agency (EPA) published in 2007, data center peak load power consumption was 7GW in 2006 and, at the current rate, it is expected to increase up to 12GW by 2011 leading to a cost of $7.4 billion per year. Similarly, rack power consumption has increased up to 30 KW.
At current power usage levels, powering and cooling servers, racks, and the entire data center efficiently has become a challenging problem. Monthly management cost for a 15 MW facility can be as high as $5.6M. Income is determined by service level agreements (SLAs), which set the price paid by users based on the quality of service (QoS) they receive. A data center's operating margin depends on the provided quality of service. Higher QoS levels typically lead to higher rates that can be charged to customers.
Several factors make it impractical to design and implement a single centralized controller to manage all resources in a data center, including both the computational (cyber) resources and the cooling (physical) resources. For example, there may be hundreds of variables to be measured and controlled to manage the resources. Also, the dynamics of controlled processes span over multiple time scales. For example, electricity costs can fluctuate on a time scale of hours, temperatures evolve in the order of minutes, and server power state control can be as frequent as milliseconds. Actuators differ not only in time scales, but also in the spatial areas they influence. For example, computer room air conditioner (CRAC) reference temperatures can affect the inlet air of multiple servers, while central processing unit CPU power states affect only single servers. The inability to manage the computational (cyber) resources and the cooling (physical) resources in the data center centrally lends itself to inefficient use of the resources and as a result increased costs for power and cooling.