Organizations are increasingly deploying applications on virtual machines (VMs) to improve Information Technology (IT) efficiency and application availability. A key benefit of adopting virtual machines is that they can be hosted on a smaller number of physical servers (VM servers). Different types of VM backup and recovery options exist.
Backup solutions exist in VM environments. An environment can include a host server that has one or more VMs and a VM manager or hypervisor to create and manage VMs. Management tasks include creating snapshots of VMs. A hypervisor is computer software, firmware or hardware that creates and runs virtual machines. The term hypervisor shall be interchangeable with virtual machine monitor (VMM). A computer on which a hypervisor runs one or more virtual machines is called a host machine, and each virtual machine is called a guest machine. The hypervisor presents the operating systems of VMs (guest operating systems) with a virtual operating platform and manages the execution of the guest operating systems.
A hypervisor can take snapshots of each VM to use as restore points. Thus, a user can revert back to, or restore, a previous state of a VM by using the snapshot. A snapshot of a VM can include the power state (on, off, suspended), data (virtual disks, memory) and hardware configuration (virtual network interface cards) of the VM at the time the snapshot was generated. Thus, snapshots can be leveraged by backup systems to back up a VM by storing the data, configuration and power state that was generated by the snapshot onto a disk. In this manner, if a host or disk or memory fails in relation to the VM, a backup of the VM exists, and important files and applications can be salvaged.
A VM environment often includes a centralized management server (CMS) that can serve as an access and management point for multiple VM hosts. Beneficially, a single CMS can manage many VM hosts, and the many VMs on each host, thereby increasing the management efficiency and controllability of many VMs. In such an environment, a backup server communicates with the CMS to coordinate a backup of a target VM. ‘Backup server’ and ‘backup application server’ shall be used interchangeably herein.
In some situations, the communication between the CMS and backup server is lost. For example, communication loss can occur from a disconnection or poor connection, a timeout, an interruption, a CMS failure, or a power failure. In such a case, the backups will fail because communication to the VMs, through the CMS, is lost.
Without backups, a customer's data protection can be jeopardized. Backup service plans to customers can have a defined recovery point objective (RPO) and recovery time objective (RTO).
RPO is a metric that indicates an amount of data that may be at risk of being lost. This can be determined by the amount of time between data protection events (such as snapshots and/or backups) and reflects the amount of data that potentially could be lost due, due to a failure.
RTO is a metric that relates to downtime. The metric includes an amount of time to recover from a data loss event, and how long it takes to return to service. For example, the RTO can refer to the amount of time a user's VM is unavailable or inaccessible or inoperative.
In the case that communication between a backup server and a CSM is lost; a failed backup can create service delays. A user may have to wait until the communication is resumed to perform a backup, and then wait again for the backup to complete. Furthermore, the backup window may be missed, in the case of periodic scheduled backups. Thus, this can impact the RPO and RTO metric of a VM service.
Furthermore, if the communication between the backup server and the CMS is not resumed, then all subsequent backups may fail, which can have a grave impact on the RPO. Thus, it is beneficial to provide a solution even if communication between a backup server and a CMS is lost.