Cloud computing can allow dynamically scalable virtualized resources to be provided as a service. Cloud computing can assure an appropriate level of resources are available to power software applications when and where the resources are needed in response to demand. As a result, cloud computing allows entities to respond quickly, efficiently, and in an automated fashion to rapidly changing business environments.
Virtual machine (“VM”) redundancy is the foundation of resilient cloud applications. While active-active approaches combined with load balancing and auto-scaling are usually resource efficient, the stateful nature of many cloud applications often necessitates 1+1 (or 1+n) active-standby approaches that can leave 50% (or more) of VMs idle or nearly so. The active-standby redundancy is one of the oldest yet most ubiquitously used design patterns for both fault tolerance and disaster recovery of modern computer systems. It is parameterized as 1+n redundancy in which one of n cold, warm, or hot spares takes over upon the failure of the single active primary. A range of values of n are common, from 1+1 for disaster recovery, 1+2 to maintain redundancy during lengthy repairs and upgrades, (1+1)+(1+1) (effectively, 1+3) multi-site designs in which a 1+1 standby site backs up another 1+1 primary site, to general 1+n chained replica systems.
Unfortunately, inefficient utilization of resources is standby redundancy's Achilles heel. Keeping the standbys idle (except for synchronizing state) during normal operation results in 50% (for 1+1 systems) or more of a system's peak capacity being wasted. The active-active systems in which all replicas are utilized during normal operation can eliminate wastage, but such designs are practical mostly when replicas are stateless, carry a soft-state that can be recreated in a spare (e.g., stateless web servers backed by a database), or contain a state that can be shared (e.g., key-value stores). For most other stateful systems, standby redundancy continues to be the viable option despite its limitations.