For an organisation which hosts network-connected applications (including, but not limited to, companies hosting websites on the Internet), there are two key problems:                1. Components, servers, networks and storage devices can fail, in which case applications will need to be recovered, perhaps manually, from a secondary data store (such as a backup at a disaster recovery site). We will refer to this as the redundancy problem.        2. Load generated by applications can vary significantly over time, for example a website can experience a spike in traffic, so applications may need to be moved between servers in order to maintain an acceptable level of utilisation. We will refer to this as the load-balancing problem.        
In the case of the redundancy problem, current solutions include:                Adding redundancy at the physical hardware level, for example by use of dual-redundant power supplies. Disadvantages to this approach include that it is extremely difficult (i.e. expensive) to completely eliminate single points of failure within a single server, and even if this can be achieved, the system will still have a single point of failure in the operating system or other application software (e.g. the web server or kernel might crash).        Virtualising the server and replicating every change in memory and system state to a second physical host over a high-speed LAN so that the second host can take over if the first fails, for example with VMware vMotion. Disadvantages to this approach include that virtualisation imposes a performance overhead on applications, that it requires almost the resources of two servers to run (the live one and the replica), and that the replica can only be located geographically locally. Furthermore this approach only works with a shared storage backend, which can be prohibitively expensive. Also this approach cannot be applied between datacentres or on commodity setups without high-speed connectivity between servers.        
In the case of the load-balancing problem, current solutions include:                Manually moving applications between servers when a spike of load occurs.        
Disadvantages of this approach include that individual servers are vulnerable to spikes in load of any of their hosted applications, which can cause all of the hosted applications on a server to crash, and the need for manual intervention which can delay recovery time significantly.                Isolating applications which are generating large amounts of load on the system with operation-system level constraints, for example the CloudLinux kernel extensions. Disadvantages of this approach include that if an application experiences a spike in load, that application is effectively taken offline (or made to run very slowly) until it is manually moved to another server.        The use of load balancer appliances (hardware or software) in conjunction with stateless or semi-stateless application servers and a shared storage backend (SAN), in order to distribute the load of the applications across multiple servers. We will refer to this solution as a “classical cluster”. Disadvantages to this approach include that the SAN itself acts as a single point of failure, failures of which may be catastrophic, and that such a cluster cannot operate across geographically diverse regions. Further disadvantages to a classical cluster include needing to implement complex solutions for the “split-brain” problem, where servers become disconnected from each other but not from the shared storage medium, which can cause data corruption, requiring that administrators sets up quorum, fencing or STONITH (“shoot the other node in the head”) to physically power off a server if it becomes unresponsive.        