The term virtual machine may be used to refer to a software implementation of a physical machine, such as a computer, that executes programs like a physical machine. FIG. 1 is a block diagram of an example system 100 that implements a virtual machine in a conventional manner. As shown in FIG. 1, system 100 includes a system hardware layer 102 that represents the actual physical resources of a computer, which may include for example one or more central processing units (CPUs), system memory, a storage device such as a disk, a graphics adapter, a network adapter, input/output (I/O) devices, or the like. A host operating system 104 is executed upon physical hardware layer 102. A virtualization layer 106 runs on top of host operating system 104 and supports one or more virtual machines 1081-108N. For example, virtualization layer 106 emulates certain hardware elements such that each of virtual machines 1081-108N can operate as if it has access to its own dedicated set of physical resources. One or more guest operating systems 1101-110N are executed on corresponding virtual machines 1081-108N and support the execution of application programs thereon. In an alternate implementation, virtualization layer 106 may run directly on top of system hardware layer 102.
Virtual machines have become increasingly popular platforms for deploying both desktops and servers. FIG. 2 is a block diagram of an example system 200 in which a plurality of servers are deployed on corresponding virtual machines 2021-202N resident on a single host computer 200. As shown in FIG. 2, server operating systems 2041-204N execute on corresponding virtual machines 2021-202N and server application(s) 2061-206N execute within the context of corresponding server operating systems 2041-204N. By consolidating multiple virtual machines onto host computer 200 in this manner, the utilization of host computer 200 can be increased, thereby increasing the return on investment in host computer 200. This consolidation may also allow fewer host computers to be used in an enterprise network or other computing environment, thereby reducing consumption of power, cooling, and/or floor space. Managing fewer host computers can also lead to improved efficiencies as there are fewer opportunities for error and breakdown.
By networking multiple host computers together, certain other advantages can be realized. FIG. 3 is a block diagram of a conventional computing environment 300 in which a plurality of host computers 3021-302N are networked together via a communication infrastructure 306. Data storage 308, which may comprise a storage area network (SAN) or other non-volatile data storage system, is also connected to communication infrastructure 306 and provides external, shared data storage for host computers 3021-302N. In an alternate implementation, internal data storage associated with each host computer is shared with the other host computer(s) via communication infrastructure 306 or some other communication medium.
As shown in FIG. 3, a virtual machine 304 is running on host computer 3021. Virtual machine management systems exist that may be used to migrate virtual machine 304 from host computer 3021 to host computer 302N, where it is denoted virtual machine 304′. Virtual machine migration generally refers to the movement of a virtual machine from a first physical machine to a second physical machine. This migration process is represented by a dashed arrow in FIG. 3. Depending upon the implementation, such migration may be performed after virtual machine 304 has been shut down and/or while virtual machine 304 is still executing. If a virtual disk associated with virtual machine 304 is stored in data storage 308, then migration may be facilitated by simply changing ownership of the virtual disk from virtual machine 304 to virtual machine 304′. Virtual machine migration advantageously enables technology managers to perform load balancing or to redeploy virtual machines in the event a host computer fails or needs to be taken off-line for maintenance.
To protect data, it is desirable to obtain periodic backups of virtual machines just as it is desirable to obtain periodic backups of physical machines. However, obtaining periodic backups of virtual machines in the same way it is done which physical machines can give rise to difficulties.
In a conventional enterprise computing scenario, a backup agent running on a virtual machine operates to generate backup data and transfer it over a communication infrastructure to a backup server. This approach may be referred to as agent-based backup. As discussed above in reference to FIG. 2, there are benefits associated with consolidating numerous virtual machines on a single host computer. However, if several virtual machines on a highly-loaded host computer are backed up at the same time, the extra load imposed by the operations of the backup agents can overload the host computer. Furthermore, the management of backup agents on virtual machines can become difficult in computing environments in which the number of virtual machines is very large and/or in which virtual machines can be migrated between host computers (such as computing environment 300 of FIG. 3).
To avoid some of these issues associated with agent-based backup, a backup of an entire virtual machine can instead be obtained by the host computer itself in an implementation in which a virtual disk associated with the virtual machine is maintained on the host file system. Such a backup may also be obtained by a data storage system connected to the host computer in an implementation in which the virtual disk is mapped directly to a block device within the data storage system. These approaches may be referred to as host-level backup and volume-level backup, respectively. However, for various reasons, such backups are typically created from snapshots of a virtual disk that are in a state equivalent to just after an unexpected machine crash, termed “crash-consistent snapshots.” Thus, these backups may not provide a desired level of consistency.
Another backup approach termed “off-host backup” or “consolidated backup” attempts to reduce the loading of the host computer on which the virtual machine is located and a communication infrastructure attached thereto. In accordance with this approach, a snapshot is obtained of the virtual disk associated with the virtual machine. In order to obtain a more consistent snapshot than a crash-consistent snapshot, the application(s) running on the virtual machine may be quiesced prior to obtaining the snapshot. A backup server then mounts the snapshot of the virtual disk image and obtains a file-level backup therefrom. While this approach can help prevent overloading of the host computer and reduce traffic on the communication infrastructure, it still cannot yield a perfectly consistent backup.
What is needed then is an approach for obtaining backups of virtual machines, such as virtual machines used to deploy servers in an enterprise network or other computing environment, in a manner that addresses the shortcoming associated with the aforementioned prior art approaches.