Power has become one of the most difficult and expensive items to manage in data centers. Up to 40% of data center power supplies are not working optimally. These power supplies consume excessive power resulting in heating, malfunctioning devices, and finally occasional or regular power shutdowns. Networks are out of control after a power failure in the data center and often customers are aware of the data center problem before the data center's operator. In 50% of the cases, the data center operator is informed by the customer on a power shutdown that has occurred in the data center. Moreover, the data center operator typically has difficulties to remote control the switches, airco's, or other electronic devices in the data center. As a consequence, recovery from a disaster where several computers in the data center are affected is slow because intervention by technicians in the data center is required.
As opposed to a dumb power distribution unit (PDU) that has no instrumentation and is not manageable, the present invention concerns a smart power distribution unit or data center management unit (DCMU) that can be metered, is equipped with one or more displays, and can be switched, i.e. individual outlets can be switched on or off remotely. Smart PDUs typically feature means for remote access such as RS-232 serial data ports, external buses such as USB (Universal Serial Bus), or a computer network controller accessible through a network protocol such as Telnet, SSH (Secure Shell), SNMP (Simple Network Management Protocol), ICQ (“I seek you”), or through a web portal. This way, the data center administrator is enabled to access the smart PDU from a remote terminal or interface in order to turn on/off power outlets, to schedule power shutdowns, to control the load, etc.
US Patent Application 2009/0228726 entitled “Environmentally Cognizant Power Management” for instance discloses such a smart PDU—305 in FIG. 3—with real-time electrical metering of consumed power at server level and sensors for environmental parameters such as temperature, humidity and airflow in order to dynamically control the applications and tasks running on the different computers for power usage optimization in the data center.
Although power usage is optimized through load balancing the applications across the computers in the data center, US 2009/0228726 does not teach preventive measures for abnormal power outages, and does not disclose how to safely reboot the data center after a power crash.
US Patent Application 2009/0070611 entitled “Managing Computer Power Consumption in a Data Center”, describes a state-of-the-art method for preventing power shutdown disasters in data centers (disaster prevention). In the known method, the aggregate power consumption of a plurality of computers in the data center is monitored and as soon as the aggregate power consumption exceeds a predetermined threshold, certain computers are throttled down in order of priority. As is indicated in paragraph [0013] of US 2009/0070611, the computers are prioritized statically or dynamically in dependence upon applications or tasks running on the computers.
Apart from the fact the method of US 2009/0070611 is executed by a separate power consumption managing computer—152 in FIG. 1—and consequently not integrated in the PDU—120 in FIG. 1—this known method is disadvantageous in that it does not prioritize computers that are critical and likely to cause a power shutdown. The preventive measures taken in US 2009/0070611 in other words may be inadequate.
US Patent Application 2008/0172553 entitled “Data Center Boot Order Control”, describes a method for restoration of power supply following a power outage in a data center (disaster recovery). The method involves assigning priorities to the computers in dependence upon financial profit, e.g. billing opportunity, SLA (Service Level Agreement) commitments, financial penalties resulting from server downtime, etc. The priorities are used to determine the reboot order of the computers in the data center.
Just like in US 2009/0070611, the method described in US 2008/0172553 is executed by a power restoration manager server—106 in FIG. 1A—that is not necessarily integrated in the PDU. More worrying however is the fact that a priority in dependence of financial considerations will not isolate critical or defect computers that have caused the abnormal power outage, as a result of which repetitive power shutdowns will be unavoidable.
US Patent Application 2005/0280969 entitled “Current Protection Apparatus and Method” describes a power distribution unit (230 in US 2005/0280969) with circuit breakers per outlet (CB1 . . . CB8 in US 2005/0280969), and a processor (236 in US 2005/0280969) that samples the current, compares the current samples to a threshold, and commands the circuit breakers to interrupt the current when the threshold is exceeded, i.e. a so called overcurrent condition. The processor operates under control of software stored on a software memory (238 in US 2005/0280969) and defining the overcurrent conditions at the level of individual outlets.
Although the power distribution unit known from US 2005/0280969 provides some form of disaster prevention through overcurrent detection and circuit breaking, the proposed mechanism overreacts on every overcurrent event resulting in many unnecessary power interruptions.
In summary, existing smart PDUs do not adequately prevent disaster situations where plural computers, racks or the entire data center are affected by a power shutdown. The prior art PDUs also do not have the intelligence to fast and safely restart the computers after a disastrous power outage in the data center. On the contrary, the existing solutions driven by financial profit and/or criticality of the applications run by the different computers most likely prioritize continued powering of those computers in the data center that consume excessive power and are likely to cause/have caused the power outage.
It is an objective of the present invention to disclose a smart PDU or data center management unit (DCMU) that overcomes the above identified shortcomings of existing PDUs. In particular, it is an objective of the present invention to disclose a DCMU with improved disaster prevention/disaster recovery capabilities, i.e. with the ability to isolate the computer(s) that have caused the power shutdown or are likely to cause a power failure in the near future, and with the ability to fast and safely reboot after a power crash, thereby minimizing the risk for repetitive power outages.