1. Technical Field
The present disclosure relates to computer data backup, and in particular, to a system and method for performing block-level backups of virtual machine, wherein backed up data is stored in de-duplicated form in a hierarchical directory structure.
2. Background of Related Art
Continuing advances in storage technology allow vast amounts of digital data to be stored cheaply and efficiently. However, in the event of a failure or catastrophe, equally vast amounts of data can be lost. Therefore, data backup is a critical component of computer-based systems. As used herein, the term “backup” may refer to the act of creating copies of data, and may refer to the actual backed-up copy of the original data. The original data typically resides on a hard drive, or on an array of hard drives, but may also reside on other forms of storage media, such as solid state memory. Data backups are necessary for several reasons, including disaster recovery, restoring data lost due to storage media failure, recovering accidentally deleted data, and repairing corrupted data resulting from malfunctioning or malicious software.
A virtual machine (VM) is a software abstraction of an underlying physical (i.e., hardware) machine which enables one or more instances of an operating system, or even one or more operating systems, to run concurrently on a physical host machine. Virtual machines have become popular with administrators of data centers, which can contain dozens, hundreds, or even thousands of physical machines. The use of virtual servers greatly simplifies the task of configuring and administering servers in a large scale environment, because a virtual machine may be quickly placed into service without incurring the expense of provisioning a hardware machine at a data center. Virtualization is highly scalable, enabling servers to be allocated or deallocated in response to changes in demand. Support and administration requirements may be reduced because virtual servers are readily monitored and accessed using remote administration tools and diagnostic software.
In one aspect, a virtual server consists of three components. The first component is virtualization software configured to run on the host machine which performs the hardware abstraction, often referred to as a hypervisor. The second component is a data file which represents the filesystem of the virtual machine, which typically contains the virtual machine's operating system, applications, data files, etc. A virtual machine data file may be a hard disk image file, such as, without limitation, a Virtual Machine Disk Format (VMDK) format file. Thus, for each virtual machine, a separate virtual machine file is required. The third component is the physical machine on which the virtualization software executes. A physical machine may include a processor, random-access memory, internal or external disk storage, and input/output interfaces, such as network, storage, and desktop interfaces (e.g., keyboard, pointing device, and graphic display interfaces.)
Virtual machine files may be backed up as images, or replications of the complete virtual machine file. Such backup schemes may logically divide and store the virtual machine file into a number of smaller logical blocks which, taken together, constitute a “snapshot” of an entire filesystem as it existed at the time the backup was performed. While such systems are well-suited for restoring an entire filesystem, such systems may have drawbacks, for example, if it is desired to restore a subset of the filesystem, such as an individual file, or a single directory, or an arbitrary collection of files and/or directories, from the backup. A backup system which performs virtual server backups with increased efficiency and effectiveness while permitting the restoration of individual files, folders, and backup subsets would be a welcome advance.