Virtualized infrastructures are widely used to provide large scale services, which involve executing large scale applications, such as multi-tier applications. These large scale applications require multiple virtual machines, some of which are dependent on each other to function properly. These dependencies between application virtual machines can be referred to as logical fault domains. As used herein, a logical fault domain is a set of application virtual machines that shares a single point of failure.
The virtual machines for each logical fault domain must be protected together against failures, such as network fault or server death, to ensure that the application can run properly. If these virtual machines for different logical fault domains, which are tied up together, are not protected together, then the entire service provided by the large scale application may not be unavailable. For an example, let's assume that there is a large scale application with a simple 3-tier architecture of a web virtual machine, an application virtual machine and a database virtual machine, and all three of these virtual machines are running on the same host computer. In this example, all three virtual machines are in a single fault domain. If a fault strikes the host computer and not all three virtual machines are restarted for any reason, then the entire service provided by the large scale application may not function properly since all three virtual machines are needed for the service to operate properly.
Currently, high availability (HA) or fault domain managers (FDM) are implemented to handle failures. However, in order for the HA or FDM to effectively handle failures, an administrator or user must manually input settings to define logical fault domains and prioritizations for various virtual machines that run large scale applications, which introduces errors and inefficiencies.