Computerized systems have made a significant contribution towards the advancement of modern society and are utilized in a number of applications to achieve advantageous results. The storage of data using electronic means is one such example that has benefited from the utilization of computerized systems. Currently available computerized file systems, such as file servers, are capable of facilitating storage and sharing of data between networks of varying sizes. When access to data stored in a file system is granted broadly—such as over the Internet—multiple servers may be combined in datacenters to address accessibility and data storage concerns.
For many file storage systems deployed over a network, the state of the network connection is of critical importance. Numerous network connection protocols (throughout several levels of implementation) are employed, with varying levels of reliability, to communicate data (generally in the form of packets) between and among networked devices. These devices may include: file servers, client devices, and any network edge devices (e.g. routers, switches) along the path of communication. The complexities of network communication, i.e. the inherent difficulties of processing vast amounts of data at high speeds over logically and/or physically large distances, are such that interruptions to network communication are not altogether uncommon. Excessive network traffic, software bugs, mechanical design flaws and simple user error may all contribute to undesirable performance, such as packet delays, unresponsive devices and a total failure of network communication.
Unfortunately, as the complexity, capacity, and sophistication of servers in datacenters increase as industry standards develop, the gravity of interruptions to network communication may also escalate correspondingly. Thus, efficient and timely maintenance of servers in a datacenter has become a paramount concern. Accordingly, automated management schemes have been developed to address these concerns. Repairing the state of a network connection for a datacenter may require reinitializing a networked server by power cycling a server (i.e., turning the server off and turning the server back on).
In some instances, when a server is hung, rebooting may be one way to recover the server. Additionally, when provisioning servers, it may be necessary to reboot a server in order to trigger it to reattempt the server's pre-boot execution environment (“PXE”) boot to enable an updated operating system to be installed on the server. Accordingly, since the rebooting procedure is so important to these basic operations it is absolutely vital for the procedure to be performed reliably. Otherwise, the servers may require manual intervention and drastic and costly repairs could be attempted unnecessarily.
Wake on LAN (“WOL,” sometimes “WoL”) is an Ethernet computer networking standard that allows a computer to be turned on or woken up from a low power state (e.g., hibernate or sleep) remotely by a network message. Wake on LAN support is typically implemented in the motherboard of a computer and may not be restricted to LAN traffic, but rather, working on all network traffic, including Internet traffic.
The general process of waking a computer up remotely over a network connection includes a target computer being shut down (e.g., Sleeping, Hibernating, or “Soft Off,”), with power reserved for the network card. The network card listens for a specific data packet, referred to as the “magic packet,” a broadcast frame containing anywhere within its payload a specific sequence of data and the MAC (media access control) address of the target computer. The magic packet is broadcast on the broadcast address for that particular subnet (or an entire LAN).
When a listening computer receives this packet, the packet is checked to verify that the packet contains the correct information (i.e., the target MAC address contained in the packet matches the MAC address of the machine), a valid match causes the computer to switch on and boot. In order for Wake on LAN to work, parts of the network interface need to stay on, even during standby. This increases the standby power used by the computer. If Wake on LAN is not needed, turning it off may reduce power consumption while the computer is off but still plugged in. However, while Wake on LAN allows a computer to be turned on or woken up from a low power state, Wake on LAN is administered via a properly running network. As such, Wake on Lan is incapable of causing a computer to reboot itself, and/or to reinitialize and repair a network connection of the computer.
There exist a few available methods to address the problem of remote re-initialization. One method to address this problem is through the use of servers with IPMI functionality. IPMI (Intelligent Platform Management Interface) is a loosely-defined standard to allow for out of band management (i.e. management over a dedicated channel) of a server and includes the ability to control the power to a server. IPMI supports a variety of connection types including independent network connection, shared network connection or serial connection.
However, all three connection types present impediments to efficient power management. Independent network connections and serial connections incur additional costs, since additional components are required, such as connectors, a serial concentrator or a network switch. Sharing a network connection eliminates these costs, but has other issues. First, it may be difficult to identify the correct media access control (“MAC”) address of the IPMI network device from the network device (e.g., router) of the server. Second, an IPMI network device requires its own IP address which can be an issue in scenarios where IP addresses are limited. Third, if a server has a bug causing it to flood the network, it may not be possible to get IPMI packets through to an IPMI controller.
Additionally, a typical IPMI management device has a very complicated set of features making the code complicated and thus, vulnerable to bugs—some of which may not manifest until the device has been widely deployed. This is aggravated by the fact that upgrading the code may require a hard power cycle of the box which cannot be performed by a IPMI device.
A second method of automating server management is through advanced power strips with management functionality (“manager power strips”). With a manager power strip, one or more servers are plugged into a power strip with a network connection. Software is implemented on the power strip that can be utilized to turn the power on or off. Unlike IPMI devices, this solution requires few network connections, generally requires very little software running in the power strip (thereby decreasing the likelihood of software bugs) and leverages an already existing connection to the server, e.g., the power cable. However, there are several limitations with this approach as well. Manager power strips cost more than traditional, unmanaged power strips and support fewer options than IPMI devices. Also, newer servers may be designed to be powered with direct current (instead of standard alternating current) and/or run at non-standard voltages which would not be supported by available manager power strips, or which may significantly increase costs. Additionally, newer servers are sharing power supplies for efficiency, thereby expanding the impact of managing power at the power strip level, further increasing the difficulty of specifically targeted application of power management.