Continuing advances in storage technology provide significant amounts of digital data to be stored cheaply and efficiently. However, this means that significant amounts of data can be lost in the event of a failure or catastrophe. Accordingly, data backup of original data is a critical component of computer-based systems. The original data typically resides on a hard drive, or on an array of hard drives, but may also reside on other forms of storage media, such as solid state memory. Data backups are critical for several reasons, including disaster recovery, restoring data lost due to storage media failure, recovering accidentally deleted data, and repairing corrupted data resulting from malfunctioning or malicious software.
A virtual machine (“VM”) is a software abstraction of an underlying physical (i.e., hardware) machine that enables one or more instances of an operating system, or even one or more operating systems, to run concurrently on a physical host machine. Virtual machines have become popular with administrators of data centers, which can contain dozens, hundreds, or even thousands of physical machines. The use of virtual servers greatly simplifies the task of configuring and administering servers in a large scale environment, because a virtual machine may be quickly placed into service without incurring the expense of provisioning a hardware machine at a data center. Virtualization is highly scalable, enabling servers to be allocated or de-allocated in response to changes in demand. Moreover, support and administration requirements may be reduced because virtual servers are readily monitored and accessed using remote administration tools and diagnostic software.
In one aspect, a virtual server consists of three components. The first component is virtualization software configured to run on the host machine which performs the hardware abstraction, often referred to as a hypervisor. The second component is a data file which represents the filesystem of the virtual machine and typically contains the virtual machine's operating system, applications, data files, etc. A virtual machine data file may be a hard disk image file, such as, without limitation, a Virtual Machine Disk Format (“VMDK”) format file. Thus, for each virtual machine, a separate virtual machine file is required. The third component is the physical machine on which the virtualization software executes. A physical machine may include a processor, random-access memory, internal or external disk storage, and input/output interfaces, such as network, storage, and desktop interfaces (e.g., keyboard, pointing device, and graphic display interfaces.)
Currently, existing data backup applications can make backups of virtual machine data files without using a software agent (i.e., agentless backup). In other words, existing applications can create backups without installing software within the virtual machine itself. However, at certain times, there is still a need during the backup of the virtual machine to execute some actions inside the virtual machine (i.e., the machine where the backup is made). For example, when the application creates a backup of the virtual machine running on the MS Exchange/MS SQL/MS Active Directory or a similar database server, the backup application will likely need to: (1) collect metadata about services running inside the virtual machine, and (2) store the collected data in a data archive. With existing backup applications, these operation cannot be performed from outside the virtual machine because, for example, certain operations such as collecting metadata can only be performed locally from inside the virtual machine. An additional problem is that after the completion of backup procedure, often the backup applications will need to perform necessary additional actions, such as truncating logs of the services to prepare them for the next backup.
Thus, while the existing backup methods of a virtual machine may be called “agentless”, they are in fact not agentless from a practical standpoint. Instead, to perform the operations noted above and others, a special “mini-agent” is copied into the virtual machine (i.e., on its file system) before the backup is performed and then removed after the backup is created. While the “mini-agent” is on the virtual machine, the “mini-agent” performs execution of the necessary actions, saves the results in the filesystem of the virtual machine, collects the results and transfers the results to an agent to be archived. Moreover, in the event of unforeseen problems, the “mini-agent” and the results of its operation may remain inside the virtual machine. In addition, it is also not always possible to modify the filesystem of the virtual machine (e.g., due to lack of disk space) and there is a need to modify the registry in the virtual machine during the backup operation.
FIG. 1 illustrates a conventional system for performing a backup procedure of a virtual machine. As shown, a virtual machine 102 is run on a host machine 101. In general, virtual machines, such as that shown in FIG. 1, can be configured using any appropriate server virtualization technology, such as that provided by VMware, Inc. of Palo Alto, Calif., including vSphere®. VSphere® is a suite of tools offering the ability to perform cloud computing utilizing enterprise-level virtualization products such as VMwares ESX® and/or ESXi. VSphere allows multiple virtual machines to run on any ESX host, although only a single virtual machine 102 is illustrated in FIG. 1. Other virtual machine technology may be used including any appropriate virtual machine technology provided by other vendors.
As further shown in FIG. 1, a backup agent 201 is provided that uses a special interface Vix API (i.e., VSphere that provides for communication between the host-guest system and the system) to put files (e.g., tools and utilities) on the same channel that is executing these programs. In particular, the backup agent 201 is provided to collect metadata and store the metadata on the C: drive (i.e., the system drive) of the ESX host system, as well as to automatically store this information into the archive.
According to one existing backup method, the backup agent 201 can be installed outside the ESX Host 101. However, a set of backup agent files 203 must be transmitted and saved inside the virtual machine 102 on the ESX host 101 using native Vix API interface 202. These files are copied into ESX host 101 and then from the ESX host 101 to the virtual machine 102 into temporary directory on system volume C: drive of the virtual machine disk 103. The backup agent files 203 (i.e., the “mini agent” discussed above) then can collect metadata of the applications, including, for example, a Microsoft SQL® 104 and other applications 105 into a directory (e.g., CAMetadata 106) that is stored in the root directory of system volume C: drive of the virtual machine 102. Next, a shadow copy or volume snapshop service snapshot (e.g., a VSS-snapshot) of the virtual machine 102 is created and certain post-snapshot operations (e.g., truncate logs, and the like) are performed as is known to one skilled in the art. Once the VSS snapshot is created, a backup copy of the virtual machine data file can be created and all data collected can automatically be stored into archive. Finally, the VSS snapshot can be deleted and the backup agent files 203 can be optionally removed.
Thus, as shown in FIG. 1, the existing systems for performing a backup procedure of a virtual machine are not in fact “agentless” since of backup agent files 203 must be saved to the virtual machine 102. Moreover, there are certain technical disadvantages with these systems. For example, with the backup agent files 203, “waste products”, for example, utilities can remain inside the virtual machine and are essentially unnecessary waste. Moreover, a secondary backup of the backup may not be compatible with the physical machine (i.e., the host system), such that the data may not be subject to backup host machine itself, in the way of its drive. Moreover, during the backup process, there can be significant intervention to the guest of the virtual machine, for example, certain components of VSS-to-integrate will be substituted into the VSS backup. Yet further, the VSS-process in the guest machine is caused by the hypervisor (i.e., the ESX host 101, rather than the backup agent 201, making it difficult for the backup software application to fully control the backup process.