The speed of recovery during disaster has been a concern throughout the era of the personal computer and distributed client-server systems. Backup administrators and restore operators need to ensure they are meeting Recovery Time Objectives (RTOs) and Service Level Agreement levels (SLAB) for all mission-critical applications and servers.
Traditional methods of recovering image level backups include the complete restoration of an image-level backup into a production environment. Traditional recovery techniques also do not allow users to access and use data sets being restored while a restoration is ongoing.
For virtual machines, backups and restorations are typically performed at the image level, so the data size that needs to be restored can be overwhelming. For example, restoring a file server virtual machine (VM) with 1 terabyte (TB) disk can take up to 8 hours on a 1 gigabit (Gb) network.
In order to conserve storage space, backup files themselves are typically highly compressed and/or de-duplicated. For example, some commercially available backup tools, such as VEEAM™ Backup from Veeam Software International Ltd., provide mechanisms for de-duplication and compression of image level backup files. Deduplication may be applied when backing up multiple virtual machines (VMs) that have similar data blocks within them. For example, if VMs were created based on the same template, or if VMs with a large amount of free space on their logical disks are backed up, deduplication of backups of the VMs can reduce storage space required for the backups of those VMs.
Another means for decreasing the backup size is compression. Again, while compression decreases the size of created backup files, it increases the duration for backup creation, verification, restoration, and recovery procedures.
In order to enhance security, backup files are also often encrypted.
Thus, in an initial restoration step, backup files may need to be extracted (i.e., decompressed) and/or decrypted completely before their contents can be read. The extracted VM data are then copied to a target production environment. Using traditional techniques, restoration and recovery process can take hours depending on the size of the VM to be restored, because large amounts of data need to be extracted and moved across from the backup storage to the production storage. The time required to copy the extracted VM image data over to production storage is the primary factor affecting overall duration of the traditional recovery process.
Finally, the VM is registered with a virtual environment and started. If the VM or applications inside it do not start due to an image level backup being unrecoverable, the process of recovery needs to be repeated using different backup files, until a viable, working backup file is found and the restored VM is running as expected—which concludes the traditional recovery process.
In order to verify the functionality of data restored from image level backups, some traditional recovery techniques stage restored data on isolated, test networks and servers. This results in the need for additional time to first stage recovered data objects in a test environment before it is made available in a production environment.
Thus, traditional image-level recovery processes are resource intensive, inefficient, and as a result, may take hours to complete—primarily due to having to copy very large amounts of data from an image level backup file to a production environment. This can prevent users from using production data and applications during the restore process. This can also often jeopardize achieving RTOs and SLAs resulting in extended and costly downtime for production systems.
Therefore, there is a need for an efficient method of quick recovery of VMs from image-level backups to production environment. There is also a need for methods and systems which allow users to access data sets while a restoration is running.