FIG. 1 illustrates a hypervisor 100 running a virtual machine 102. In computer science, a hypervisor is a piece of computer software, firmware, or hardware that creates and runs virtual machines. A hypervisor may also be called a virtual machine monitor (VMM). A computer on which a hypervisor is running a virtual machine may be referred to as a “host machine.” The virtual machine running on the hypervisor may be referred to as a “guest machine.” A hypervisor provides a virtual operating platform for a guest operating system. The hypervisor also manages execution of the guest operating system. Multiple instances of guest operating systems may share virtualized hardware resources. A virtual machine (VM) is a software implemented abstraction of a set of underlying computer hardware. VMs are based on the specifications of a hypothetical computer and may, therefore, emulate the computer architecture of a tangible computer, including the functions of the tangible computer.
FIG. 1 illustrates primary data store 112 as a virtualized hardware resource. De-duplication storage 120 may be the actual hardware resource that supports the virtualized hardware resource. Systems administrators may choose to configure de-duplication storage 120 as the primary data store 112 for the virtual machine 102 for a variety of reasons. For example, virtual machines running the same guest operating system may share gigabytes of common data that is efficiently stored in a de-duplicated manner on de-duplication storage 120 and data read in when an operating system is booting may be the same for guest operating systems running on a number of virtual machines. These repetitive reads may be handled efficiently by de-duplication storage 120.
Backing up copies of virtual machines to the de-duplication storage 120 may also be very efficient since many blocks that are common to the guest operating systems will not need to actually be written to the de-duplication storage 120 since those duplicate blocks will already be on the de-duplication storage 120. Using de-duplication storage 120 as the primary data store 112 may also provide efficiencies associated with de-duplication replication for offsite disaster recovery. Instead of having to replicate common blocks that are shared by the multiple instances of the guest operating systems on the virtual machines, a single copy of the de-duplicated blocks may be replicated.
However, using de-duplication storage 120 as the primary data store 112 for virtual machine 102 may create some undesirable issues. FIG. 2 illustrates a virtual machine (VM) 102 running an operating system 103 that in turn is running an application 104 and an application 105. Initially, when the operating system 103 boots, or when an application starts up, long sequential reads of de-duplicated blocks may occur efficiently from the base disk file 230. However, after boot time, as non-boot processing occurs in the operating system 103 or applications 104 and 105, more random I/O may occur. Using a de-duplication storage apparatus to satisfy this random I/O load may provide a sub-optimal experience. By way of illustration, a guest operating system (e.g., operating system 103) may perform numerous, frequent non-sequential writes. Many of these writes may be quickly over-written. Additionally, the most recently written data may also be read frequently. The most recently written and read data may be small and unique to a specific VM. This I/O load may be ill-suited to a de-duplication apparatus.
While handling this ill-suited load may be annoying for a single VM and a single guest operating system, when a hypervisor runs multiple (e.g., hundreds) of guest operating systems, the random I/O load may begin to degrade I/O performance to an unacceptable level when a de-duplication appliance is used for the primary data store. A conventional system may attempt to mitigate this issue by providing large read caches to prevent unwanted accesses to the de-duplication data store. However, providing large read caches and other conventional approaches to mitigating performance degradation when using a de-duplication apparatus for primary VM storage may still not provide adequate performance.