Backup and recovery software products are crucial for enterprise level network clients. Customers rely on backup systems to efficiently back up and recover data in the event of user error, data loss, system outages, hardware failure, or other catastrophic events to allow business applications to remain in service or quickly come back up to service after a failure condition or an outage. The advent of virtualization technology has led to the increased use of virtual machines as data storage targets. Virtual machine (VM) disaster recovery systems using hypervisor platforms, such as vSphere from VMware or Hyper-V from Microsoft, among others, have been developed to provide recovery from multiple disaster scenarios including total site loss. The immense amount of data involved in large-scale (e.g., municipal, enterprise, etc.) level backup applications and the number of different potential problems that exist means that backup performance and reliable operation is a critical concern for system administrators.
Virtualized storage systems, such as Hyper-V servers are being rapidly and increasingly deployed in customer's environments. In order to achieve high availability Hyper-V virtual machines are often configured in a clustered environment with the data stored on CSV (cluster shared volume) based systems. The size of these deployments is growing by the day, thus introducing significant challenges in protecting these expanding deployments. As the environments scale upwards, there is a need to increase the number and size of CSV volumes. In large environments, it has been observed that backup operations often fail for snapshot creation operations. In Microsoft VSS (virtual shadow copy service) frameworks, such systems commonly generate a timeout error. In this implementation scenario, the backup application typically uses a standard VSS workflow for snapshot operation. If the VSS framework is unable to take a snapshot of a scaled out environment with an overly large number of CSV disks, it reports a backup failure with an error code (e.g., 0x80780021) that indicates that the Windows backup timed-out before the shared protection point was created.
Thus, in a customer scaled out environment where there are thousands of virtual machines configured for high availability, with such a large number of CSVs, it is important to ensure that all VMs that can be backed up are protected by eliminating such timeout issues that are often seen for such huge environments. What is needed, therefore, is a backup method that uses existing VSS framework but implements different policies that can be used to make sure that the backup operation does not fail with timeout errors.
The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions. EMC, Networker, Data Domain, Data Domain Restorer, and Data Domain Boost are trademarks of EMC Corporation.