Failure recovery scenarios present multiple challenges within the context of VM management. High-availability and disaster recovery of VMs and/or services can be enabled by replication and/or restart. Central processing unit (CPU), memory and storage may all be included in a replica. Full-replication of VMs in the cloud requires the CPU, memory states, and storage of all of the VMs be duplicated, which presents high overhead costs. As such, a need exists to select a particular set of VMs to replicate in failure recovery scenarios to reduce the overhead costs.
Additionally, not all failed VMs can be restarted when resources are insufficient. Also, when resources are sufficient, a restart of all failed VMs may take a long time, as computation resource will likely create a bottleneck. As such, a need exists to select a particular set of VMs to restart earlier than other VMs in a system.