Organizations such as on-line retailers, Internet service providers, search providers, financial institutions, universities, and other computing-intensive organizations often conduct computer operations from large scale computing facilities. Such computing facilities house and accommodate a large amount of server, network, and computer equipment to process, store, and exchange data as needed to carry out an organization's operations. Typically, a computer room of a computing facility includes many computing racks, which may include server racks. Each computing rack, in turn, may include many computer systems, servers, associated computer equipment, etc.
Because the computer room of a computing facility may contain a large number of servers, a large amount of electrical power may be required to operate the facility. In addition, the electrical power is distributed to a large number of locations spread throughout the computer room (e.g., many racks spaced from one another, and many servers in each rack). Usually, a facility receives a power feed at a relatively high voltage. This power feed is stepped down to a lower voltage (e.g., 208V). A network of cabling, bus bars, power connectors, and power distribution units, is used to deliver the power at the lower voltage to numerous specific components in the facility.
The amount of computing capacity needed for any given data center may change rapidly as business needs dictate. Most often, there is a need for increased computing capacity at a location. Initially providing computing capacity in a data center, or expanding the existing capacity of a data center (in the form of additional servers, for example), is resource-intensive and may take many months to implement. Substantial time and resources are typically required to design and build a data center (or expansion thereof), lay cables, install racks, enclosures, and cooling systems to implement waste heat removal therefrom. Additional time and resources are typically needed to conduct inspections and obtain certifications and approvals, such as for electrical and HVAC systems.
Some data centers have no redundancy at the PDU level. Such data centers may have a large affected zone when a UPS or PDU failure in the power system occurs. In addition, some data centers have “single threaded” distribution via the electrical supply to the floor, and in which maintenance can only be performed when the components are shut-off. The down-time associated with maintenance and reconfiguration of primary power systems in a data center may result in a significant loss in computing resources. In some critical systems such as hospital equipment and security systems, down-time may result in significant disruption and, in some cases, adversely affect health and safety.
Some systems include dual power systems that provide redundant power support for computing equipment. In some systems, an automatic transfer switch (“ATS”) provides switching from a primary power system to a secondary (e.g., back-up) power system. In a typical system, the automatic transfer switch automatically switches a computing rack to the secondary system upon detecting a fault in the primary power. To maintain the computing equipment in continuous operation, the automatic transfer switch may need to make the transfer to secondary power system rapidly (for example, within about 16 milliseconds).
Some data centers include back-up components and systems to provide back-up power to servers in the event of a failure of components or systems in a primary power system. In some data centers, a primary power system may have its own back-up system that is fully redundant at all levels of the power system. Such a level of redundancy for the systems and components supported by the primary and fully-redundant back-up system may be referred to as “2N” redundancy. For example, in a data center having multiple server rooms, one or more server racks may receive power support from a primary power system and fully-redundant back-up power system. The back-up system for each server room may have a switchboard, uninterruptible power supply (UPS), and floor power distribution unit (PDU) that mirrors a corresponding switchboard, uninterruptible power supply, and floor power distribution unit in the primary power system for that server room. Providing full redundancy of the primary power systems may, however, be very costly both in terms of capital costs (in that in may require a large number of expensive switchboard, UPSs, and PDUs, for example) and in terms of costs of operation and maintenance. In addition, with respect to the primary computer systems, special procedures may be required to switch components from the primary system to a back-up system to ensure uninterrupted power supply for the servers, further increasing maintenance costs. As a result, some data centers may include a back-up system that is less than fully redundant for a primary power system. Such a level of redundancy for the systems and components supported by the primary and fully-redundant back-up system may be referred to as “N+1” redundancy. While N+1 redundancy may not provide fully-redundant reserve power support for computing equipment, such redundancy may involve lower capital and operating costs.
Some servers are coupled to one or more back-up components and systems via a dedicated power pathway, where the number of pathways coupling one or more servers to the one or more back-up components and systems is limited to a particular pathway. In addition, some back-up components and systems provide back-up power support to multiple servers. In both instances, servers may be vulnerable to loss of back-up power support from various causes, including faults in the dedicated power pathway between the servers and the back-up components and systems and faults in the back-up components and systems themselves. Furthermore, where a set of back-up components and systems provide back-up power support to multiple servers, those multiple servers may lose back-up power support due to one or more various faults related to the back-up components and systems, one or more pathways associated with such components and systems, etc. Furthermore, back-up power support for one or more servers may be lost if one or more back-up components and systems are taken off-line for maintenance. Mitigating such risks may be costly in terms of capital costs and in terms of costs of operation and maintenance (for example, performing additional maintenance on back-up components and systems to mitigate the risk of back-up power support loss due to faults in such components and systems).
While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to.