Many computers operate under some expectation of fault tolerance. Machines typically depend on some infrastructure, such as electrical power, network connectivity, etc. No infrastructure is 100% reliable, and the expectation of fault tolerance may dictate that operation of the machines continue smoothly (or, at least, that the machines halt gracefully) if some portion of the infrastructure fails.
One example of infrastructure that is subject to failure is the supply of electrical power. Computers and other machines depend on electrical power in order to operate. Many such machines are not able to handle an abrupt loss of power. For example, a computer may be in the middle of committing atomic operations (e.g., disk writes, state changes, etc.), which cannot easily be unwound if power is lost during the commit process. Even if the machine were able to deal with an abrupt loss of power, there may be quality of service issues (e.g., users' expectations that the machines will be running more often than not) that weigh against taking a machine out of service simply because a source of electrical power has been lost. Therefore, machines are often set up to use plural sources of power in the event that one source fails.
Mechanisms that may be used to provide plural sources of power are dual-cording and Uninterruptable Power Supplies (UPSs). With dual-cording, a machine receives power through two separate power cords, each connected to its own power converter within the machine. In normal operation, the machine draws half of its power from each cord/converter. If the power supplying one cord (or the converter in the machine) fails, the power draw is switched to the remaining cord, so the machine continues to operate while drawing full power through one of its cords. A UPS is another type of mechanism that helps to provide fault tolerance in the event of a power loss. A UPS connects a machine to an underlying power source (e.g., the utility power grid), while also providing a battery backup. Thus, if the grid power fails, the UPS continues to supply power, temporarily, from its battery.
One arrangement involving UPSs and dual-cording is to use two UPSs. In such an arrangement, one of a machine's cords is connected to one UPS, and the other order is connected to the other UPS. In normal operation, half the power load flows through each UPS, but if a UPS fails, then the remaining UPS picks up the full load. However, this design involves massive over-sizing of UPS capacity, since it involves maintaining UPSs that, collectively, can deliver at least twice as much power as would be used during normal operation. A group of servers in a data center may draw thousands of kilowatts of power. It may not be practical to double-size the UPS capacity for an entire group of servers.