As modern network connectivity improves, distributed computing is becoming more common. A distributed computing system can include multiple computing nodes (also referred to simply as “nodes”) that communicate through a network to access data stored on a shared storage object. The shared storage object can be implemented using a storage area network (SAN).
While modern network connectivity improves, modern computers are also becoming more powerful. This increased computing power can be harnessed by implementing virtual machines. Virtual machines are software implementations of physical computers that execute computer instructions in a manner that replicates the results of physical computer hardware. Many of today's computer systems, such as computing nodes in a distributed computing system, are able to act as host computer systems to multiple virtual machines. The virtual machines implemented by computing nodes can also access data stored on a shared storage object.
Many distributed computing systems offer an advantage in the form of fault tolerance. In other words, if one of the nodes becomes unavailable due to failure, maintenance, or increased consumption of computing resources, one or more of the other nodes in the distributed computing system can transfer pending tasks from the failed node.
In order to provide improved fault tolerance, distributed computing systems need to be able to protect against data loss due to failures or errors. One way to safeguard against such data loss is by implementing a backup application. The backup application, which can be executed on at least one of the nodes on a distributed computing system, can periodically backup the data stored on the shared storage object. In the event of a failure that results in data loss or corruption, the data on the shared storage object can be recovered via a backup or data archive created by the backup application.
One drawback to such an implementation of a backup application is that performing a backup of the entire shared storage object can be time and resource intensive (both in computing and network resources), particularly in the case of a very-high capacity shared storage object shared by a large number of nodes. Where multiple virtual machines execute on a particular node, it can be desirable to reduce the demands of a backup application by limiting backup operations to only the portions of the shared storage object accessed by particular virtual machines implemented by the nodes, because the presence of multiple virtual machines presents the possibility of compounded parallel backup demand. Further, conventional backup methods allow a backup of one virtual machine to impact the performance of other virtual machines when a snapshot is held for the full shared storage object to support a virtual machine that is being backed up. But backing up only the portions of the shared storage object accessed by the virtual machines traditionally requires an installation of an instance of a backup application within each individual virtual machine. The need for a backup application within each virtual machine has traditionally resulted from the inability of the file system of the host node to correlate disk access to a particular virtual machine.
Installation and operation of a backup application on each virtual machine greatly complicates the administration of the distributed computing system, since a system administrator must install, coordinate and maintain each individual instance of the backup application. For example, in a scenario where a node supports multiple virtual machines, the computing input/output resources of the node can be taxed if each virtual machine supported by the node simultaneously executes an instance of the backup application. Thus, what is desirable is an approach that enables at least partial backups of a shared storage object, but does not require the installation of a backup application within each virtual machine supported by a distributed computing system.