Entities often generate and use data that is important in some way to their operations. This data can include, for example, business data, financial data, and personnel data. If this data were lost or compromised, the entity may realize significant adverse financial and other consequences. Accordingly, many entities have chosen to back up some or all of their data so that in the event of a natural disaster, unauthorized access, or other events, the entity can recover any data that was compromised or lost, and then restore that data to one or more locations, machines, and/or environments. However, the process of backing up data can be problematic in some environments.
The hypothetical example of a cluster that includes multiple hosts. Each host may include one or more virtual machines (VM), and multiple datastores is illustrative. In this example, the hosts may have shared access to each of the datastores. During normal operations, the VMs host storage data path allows access to the datastores, and backup of the VMs may be performed in connection with a separate backup proxy. In order to perform the backup, this backup proxy can have either direct access through its own storage data path or through the host's storage data path.
However, the cluster will typically experience a configuration change at one time or another. For example, a VM host and/or datastore may go offline or otherwise become unavailable to other nodes in the cluster. In the event of such changes, host storage data paths can be affected and normal operations may no longer be effective or available. For example, if a VM has data stored on a particular datastore, but that datastore is offline for a specific host, that VM must be backed up leveraging the storage path of another host, if possible.
However, it may be problematic to back up the VM using the storage data path of another host. For example, there may be one or more datastores in the cluster that are unavailable to the host where the backup proxy resides. As a result, the VM's host may respond to non-availability of a data store by employing a network block device (NBD) transport mode where data is read from the VM and transmitted across a network to a backup server. In many cases however, the network may not have adequate bandwidth to support fast and efficient transport of the data to the backup server. Moreover, backup of the VM using the network may compromise other operations for which the network was primarily intended. These problems are magnified in relatively large networks where predictability of data paths is important to ensure consistent operations and response times.
In some circumstances, the backup proxy may attempt to back up the VM in a non-optimal, or at least less optimal, way after a configuration change has occurred. For example, if a datastore is not available to a VM host through its storage data path, it is possible that a less optimal alternative may be used to back up the VM. In this situation, the backup proxy may gain access to the VM data through the network data path to the VM host. To do so however, the backup proxy may require use of a network transport mode.
As the foregoing examples demonstrate, the method to perform the backup (VM host storage data path or VM host network data path) may be randomly determined. Consequently, the backup of VMs in a cluster may not all interact consistently with a given backup proxy, data store, host, or cluster. This inconsistent behavior can adversely affect backup process, as well as the resultant images.
In light of the foregoing, it would be useful when backing up a VM to minimize, or avoid, the use of cross-host networking, such as by way of a network transport mode, in the event that the VM host and/or datastore accessible to the VM host are unavailable for some reason. Likewise, it would be desirable to associate VM hosts only with those datastores to which they have access so that even if one or more datastores of the cluster become unavailable, the backup proxy still has storage data path access to a datastore that can be utilized without necessitating a network transport mode.