Backup and recovery software products are crucial for enterprise level network clients. Customers rely on backup systems to efficiently back up and recover data in the event of user error, data loss, system outages, hardware failure, or other catastrophic events to allow business applications to remain in service or quickly come back up to service after a failure condition or an outage. Data protection and comprehensive backup and disaster recovery (DR) procedures become even more important as enterprise level networks grow and support mission critical applications and data for customers.
The advent of virtualization technology has led to the increased use of virtual machines as data storage targets. Virtual machine (VM) disaster recovery systems using hypervisor platforms, such as vSphere from VMware or Hyper-V from Microsoft, among others, have been developed to provide recovery from multiple disaster scenarios including total site loss. The immense amount of data involved in large-scale (e.g., municipal, enterprise, etc.) level backup applications means that backup disk space is a critical concern for system administrators.
The backup of virtual machines in a hypervisor is done typically through one of a couple of different ways. In a first method, each VM is handled as a physical machine. This means installing and running a backup agent in each VM, which is resource intensive and becomes cumbersome from a management perspective as the number of virtual machines increases. A second method is to back up a VM at the storage level by making a copy of the storage containers that contain the VM. Identifying the exact storage containers that contain the VM and getting them to be in a consistent state are aspects that must be managed and that also adds administrative overhead to the process.
Backup strategies typically involve a combination of full and incremental or differential backups. A full backup backs up all files from a data source in a specified backup set or job, while an incremental backup backs up only changed and new files since the last backup. During an incremental backup procedure, an application may walk the file system and find which of the files that has been changed. However, walking the file system is slow and resource intensive. Another conventional method of incremental backup uses a changed block tracking (CBT) feature provided by a virtual machine monitor or manager to keep track of data blocks changed since last backup. The CBT changes are captured in a separate file which links to its immediate parent.
To prevent version skew and potential data corruption, most high availability systems perform backups on a snapshot of the system, which is a read-only copy of the data set at a particular point hr time, and allow applications to continue writing to their data. In the case of conventional backup methods, the number of payload blocks to be backed up equals the number of user snapshots multiplied by the number of changed blocks. If any or all of these factors is relatively large, the amount of space needed to accommodate the backup can be significant.
What is needed, therefore, is a backup method that consolidates virtual disk blocks to optimize space in VM-based data storage systems.
The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions. EMC, Data Domain, Data Domain Restorer, and Data Domain Boost are trademarks of EMC Corporation.