Processing and input/output for a generic boot of a generic virtual machine (VM) is different than processing and input/output (I/O) for recovering a VM from a backup image. Although there is a large overlap between the processing and I/O in the two scenarios, there is additional specific data movement required in the recovery scenario. In addition to data reads associated with booting the VM, and random I/O that occurs during or soon after the VM is instantiated, data has to be moved from the backup image to a primary data store to create a recovery image. Conventionally this data movement has compromised (e.g., slowed down, delayed) boot processing and I/O along with early post-boot processing and I/O.
FIG. 1 illustrates a hypervisor 100 running a virtual machine 102. In computer science, a hypervisor is a piece of computer software, firmware, or hardware that creates and runs virtual machines. A hypervisor may also be called a virtual machine monitor (VMM). A computer on which a hypervisor is running a virtual machine may be referred to as a “host machine.” The virtual machine running on the hypervisor may be referred to as a “guest machine.” A hypervisor provides a virtual operating platform for a guest operating system. The hypervisor also manages execution of the guest operating system. Multiple instances of guest operating systems may share physical hardware resources as virtual resources. A virtual machine (VM) is a software implemented abstraction of a set of underlying computer hardware. VMs are based on the specifications of a hypothetical computer and may, therefore, emulate the computer architecture of a tangible computer, including the functions of the tangible computer.
FIG. 1 illustrates primary data store 112 as a virtualized hardware resource. The primary data store 112 may be associated with different actual hardware devices. In one example, deduplication storage 120 may be the actual hardware resource that supports the virtualized hardware resource. Systems administrators may choose to configure deduplication storage 120 as the primary data store 112 for the virtual machine 102 for a variety of reasons. For example, virtual machines running the same guest operating system may share gigabytes of common data that are efficiently stored in a de-duplicated manner on deduplication storage 120 and data read in when an operating system is booting may be the same for guest operating systems running on a number of virtual machines. These repetitive reads may be handled efficiently by deduplication storage 120.
Backing up copies of virtual machines to the deduplication storage 120 may also be very efficient since many blocks that are common to the guest operating systems will not need to actually be written to the deduplication storage 120 since those duplicate blocks will already be on the deduplication storage 120. Using deduplication storage 120 as the primary data store 112 may also provide efficiencies associated with deduplication replication for offsite disaster recovery. Instead of having to replicate common blocks that are shared by the multiple instances of the guest operating systems on the virtual machines, a single copy of the de-duplicated blocks may be replicated.
Thus, a hypervisor 100 may be configured to support a virtual machine 102 by using a deduplication data storage 120 that supports the primary data store 112. FIG. 2 illustrates a virtual machine (VM) 102 running an operating system 103 that in turn is running an application 104 and an application 105. Initially, when the operating system 103 boots, or when an application starts up, long sequential reads of de-duplicated blocks may occur efficiently from the base disk file 230. However, after boot time, as non-boot processing occurs in the operating system 103 or applications 104 and 105, more random I/O may occur.
Some hypervisors may use virtual machine snapshots to improve the input/output (I/O) characteristics of a virtual machine (VM) that is configured to use a deduplication apparatus as its primary data store. This facilitates using a deduplication storage device as the primary storage for a VM without having the VM experience significant degradation of I/O performance when random I/O is experienced.
FIG. 3 illustrates a hypervisor 300 that is running a virtual machine 302. VM 302 is interacting with a primary data store 312 and a secondary data store 314. Primary data store 312 is supported by a deduplication storage 320 and secondary data store 314 is supported by a non-deduplication storage 330. While a single non-deduplication storage 330 is illustrated, secondary data store 314 may be configured to interact with a plurality of non-deduplication storages.
Hypervisor 300 may allow or cause virtual machine 302 to write a snapshot to secondary data store 314. In one embodiment, hypervisor 300 may be configured to allow or cause VM 302 to write a snapshot to secondary data store 314 by treating secondary data store 314 as an alternate primary storage device. Access to the alternate primary storage device may be achieved by, for example, changing a “working directory” with which the VM 302 is interacting. A VM snapshot may be, for example, a file-based view of the state, disk data, memory, configuration, and other information associated with a VM at a specific point in time. It is possible to take multiple snapshots of a VM. A snapshot may be acquired even while a VM is running. A snapshot may be treated as a whole or may have its contents accessed individually. A snapshot may preserve the state and data of a VM at a specific point in time. The state may include, for example, the VM's power state (on, off, suspended). The data may include, for example, all the files touched by the VM and all the files that make up the VM. The data may also include, for example, information from disks, memory, and other devices touched by the VM.
Hypervisor 300 may write a snapshot of VM 302 to non-deduplication storage 330 via the secondary data store 314. Writing the snapshot to the non-deduplication storage 330, and then selectively satisfying certain I/O from the VM 302 from the non-deduplication storage 330 or from the snapshot, while selectively satisfying other I/O from the deduplication storage 320, facilitates changing and improving the I/O pattern of the VM 302. Non-deduplication storage 330 may be smaller, faster, or more suited to random I/O than deduplication storage 320. For example, deduplication storage 320 may be a tape drive, a large disk, a redundant array of independent disks (RAID), a solid state drive (SSD), a memory or other device while non-deduplication storage 330 may be a disk drive, SSD, or even memory based device. Though the types of devices used for deduplication storage 320 and non-deduplication storage 330 may vary, devices that are more suited to random I/O and non-de-duplicated data and devices that are more suited to sequential I/O and de-duplicated data may be employed. The device more suited to random I/O may be used to store the snapshot and satisfy a post boot I/O load while the device more suited to sequential I/O may be used to store de-duplicated blocks and to satisfy the boot I/O load. The snapshot can be used to satisfy the small reads and writes that are frequently repeated or over-written while the deduplication storage 320 can be used to satisfy the large sequential reads associated with booting VM 302 or operating systems or applications launched by VM 302.
FIG. 4 illustrates example I/O when VM 302 has access to a deduplication storage and to a snapshot. VM 302 may run an operating system 303, an application 304, and an application 305. While two applications and one operating system are illustrated, a greater number of applications or operating systems may be employed. When the operating system 303 boots, a large number of reads may be satisfied from base disk file 333, which may be resident in a deduplication storage apparatus. Similarly, when applications 304 or 305 are instantiated, a large number of reads may be satisfied from base disk file 333. However, as the operating system 303 and the applications 304 and 305 run, the more random I/O load described above may be satisfied from snapshot file 340. Using the dual storage approach with selective control of the device from which I/O is satisfied facilitates using the deduplication storage for what the deduplication storage is optimized for and using the snapshot for what the snapshot is optimized for. The dual storage approach facilitates mitigating performance issues associated with either a solely deduplication based approach or solely snapshot based approach.
FIG. 4 illustrates how when a snapshot file 340 is active, in one embodiment, I/O may be handled so that writes will go to the non-deduplication device, reads of newly written data will come from the non-deduplication device, overwrites of newly written data will go to the non-deduplication device, and reads of boot data and data that is not resident in the snapshot or non-deduplication device will go to the deduplication device. Different decisions may be enforced in different embodiments. While boot data is described being read from base disk file 333, VM 302, operating system 303, or applications 304 or 305 may produce some non-boot I/O that is still deduplication-centric and that will, therefore, be satisfied from base disk file 333. I/O that is “deduplication-centric” is I/O that involves reading a block or blocks of data in a manner that is more efficient when read from a deduplication repository than when read from a non-deduplication repository. For example, a collection of blocks that are stored sequentially and contiguously in a deduplication repository and that can be read using a single sequential I/O may also be stored non-sequentially or non-contiguously on a non-deduplication repository from where the blocks would require multiple input/outputs. When all the blocks are desired, the sequential I/O would be deduplication-centric.
The I/O illustrated in FIG. 4 may be efficient for boot and post-boot processing for a normal or generic VM. However, when a VM is being recovered from a backup image, the additional data movement needed to produce the recovery image may affect the otherwise efficient I/O.