The invention relates to power monitoring systems, and more particularly relates to power monitoring systems for use in data centers. In its most immediate sense, the invention relates to power monitoring systems for use in data centers utilizing power distribution units having outlet-level power monitoring and switching capabilities.
Large-scale computer operations are commonly carried out in data centers. A data center is a facility wherein computing tasks are parceled out for execution by a multiplicity—sometimes thousands—of servers (together with related equipment such as modems and routers). Conventionally, such data processing equipment is mounted in racks and supplied with power by power distribution units (“PDUs”).
A PDU is supplied with line power and distributes it to a plurality of electronic devices (conventionally, a plurality of data processing devices mounted in a rack or a computer cabinet to which the PDU is also mounted). A PDU has a plurality (often ten or twenty) female line-voltage receptacles for supplying power to e.g. servers, modems, routers etc. that are mounted in the rack or cabinet to which the PDU is attached. Modern PDUs such as those made by Raritan Inc. can monitor the power drawn from each receptacle (i.e. have “outlet level” power monitoring), thereby making it possible to e.g. determine the power being drawn by a particular modem or a computer server. Modern PDUs can also turn individual receptacles on and off (i.e. have “outlet level” switching capability) in response to instructions, thereby powering and depowering the equipment that is plugged into those receptacles.
Modern data centers use large quantities of electrical power to power the servers. Electrical power is costly, and operators of data centers seek to reduce any waste of it. Operators therefore try to shut down power to servers that are not needed to cope with the available workload. This has been done by using networked PDUs with outlet-level power monitoring capability and individual outlet switching (such as those manufactured by Raritan Inc.) to supply power to the servers, and using computer programs such as Raritan Inc.'s POWER IQ product to appropriately control the servers and PDUs.
Significantly, because a server's power supply continues to operate ever after the server has been shut down, the only way to entirely reduce the energy consumption of a server is to cut off its supply of electrical power. In order to do this in an acceptable manner, two events must occur in sequence. The first event is the server's execution of a “graceful shutdown” and the second event is the cutoff of power to the server. This will now be discussed in order.
During a graceful shutdown, a server terminates its operations in an orderly manner (i.e. such that currently running computer processes save all volatile data and close themselves only after this has taken place) and prevents new computer processes from starting. A graceful shutdown is initiated by delivering a shutdown command to the server that is to be shut down. (As used herein, the term “shutdown command” refers to a command to shut a server down normally. It is alternatively possible to command a server to execute a forced or abortive shutdown, but this is disfavored as will be discussed below.) Once a server has shut down gracefully, it has ceased to operate, but it still consumes power because its power supply remains energized. That is why the second event—switching off power to the server—is necessary.
When a server has been shut down gracefully after receipt of a shutdown command, switching off the power to the server does not corrupt ongoing computer processes, and does not interfere with the ability to restart the server by switching the power on and bringing the server back on-line with proper functionality. However, a graceful shutdown is not assured; various errors can prevent it from taking place. For example, a computer process may “hang” after the server has received a shutdown command. When this happens, the server is in a state in which switching off the power to the server (i.e. shutting it down abortively) will almost certainly have undesirable consequences (e.g. data corruption). In the worst case, turning off the power to a hung server can cause the server to become unbootable and therefore incapable of being brought back on-line without substantial repair. Alternatively, the server may reject the shutdown command as unauthorized (a so-called “authentication error”), the computer network may malfunction and fail to deliver the shutdown command to the server, or another error condition may exist. Where these or other error conditions are present, line power to the server should not be switched off because this will cause data corruption that might be avoidable by identification and repair of the condition causing the malfunction.
It would be advantageous to provide a power monitoring method and apparatus for use in data centers utilizing power distribution units having outlet-level power monitoring and switching capabilities that would identify whether a server has actually executed a graceful shutdown after receiving a shutdown command, that would switch off the power to the server if such a graceful shutdown has taken place, and that would keep the server connected to line power in the presence of predetermined error conditions.