Hardware virtualization allows a single physical computer to be divided into a number of virtual machines (which may be referred to hereinafter as “VMs”). To achieve this partitioning, a low-level piece of software, called a virtual machine monitor (which may be referred to hereinafter as “VMM”), which in some cases may be a hypervisor, is installed on the physical computer, and then conventional software including operating systems and applications are installed into the resulting VM-based environments as if they were their own physical computers. Over the past decade, virtualization has transformed enterprise computing: VMware™, Microsoft™, and Citrix™ all sell hypervisor products and a significant percentage of enterprises are using virtualization to manage their server rooms. Amazon™'s Elastic Compute Cloud™ (EC2™—see, for example, aws.amazon.com) and other competitive services, such as that offered by Rackspace™, are large-scale internet-based hosting systems in which anyone with a credit card can lease virtual machine instances, allowing them to have continuously-running, internet-connected computing resources.
A major benefit to virtualization is that of utilization: virtualization takes high-performance physical computers from which associated resources largely sit idle or operate at a sub-maximal level, and allows workloads from many servers to be packed onto those physical computers. Enterprises accordingly make better use of their hardware, and also gain an interface that allows IT departments to account for the use of IT resources back to the organizational units within a company that are consuming them. From a revenue perspective, virtualization's efficiency makes IT spending go further, and the accountability allows IT spending to be associated with its actual consumers.
Virtualized environments pertaining to data storage infrastructure that store data has historically experienced two challenges, among others. These include, but are not limited to (1) the cost of storage for virtualized environments, and (2) the flexibility with which that data is controlled and managed by administrators.
From a cost perspective, a common approach to providing storage in a virtualized environment is to buy enterprise storage hardware, as sold by vendors such as NetApp™, EMC™, and HP™. The reasoning for this purchase is that densely packed virtual machines need a great deal of storage bandwidth and capacity, and it is desirable to have this data stored in a durable and reliable manner. Further, virtualization deployments generally free VMs from having to run on a single, specific server; instead they may be booted on whatever server has available resources and may even move from one server to another using a technique called “live migration”, such as, for example, VMWare™'s vMotion™. For this to work, the disks that these VMs use must be visible to all the physical hosts in the virtualized infrastructure. Storing all their data on a single, shared storage target achieves this property because the storage used by such a VM is uniformly accessible by all of the servers on which it might potentially be migrated to.
Among other drawbacks, these enterprise storage targets are very expensive. They can often represent an estimated 40% of capital expenditures on a new virtualization deployment (the servers and VMWare™ licenses combine to form another 25%), and are among the highest-margin components of capital expenditure in enterprise IT spending. Enterprise Storage Area Networks (SANs) and Network Attached Storage (NAS) devices, which are typically utilized as memory resources for VMs and other virtual computing applications, are very expensive, representing probably the highest margin computer hardware available in a datacenter environment.
Some systems, such as Veritas™'s cluster volume manager (to name just one), attempt to mitigate this cost by consolidating multiple disks on a host and or aggregated disks within a network to provide the appearance of a single storage target. A small number of systems have structured this approach using virtual appliances: delivering the storage software as a virtual machine that runs on the same physical server as the disks that are being aggregated. Examples include, VMware™'s Virtual Storage Appliance™, Lefthand Networks™' storage appliance, and VMware™'s internal “CloudFS™” or “Lithium” project, which was both released as open source software and published as an academic paper at the Symposium on Cloud Computing by Jacob Gorm Hansen and Eric Jul entitled “Lithium: Virtual Machine Storage for the Cloud” and presented at ACM SoCC in 2010 in Indianapolis, Ind., USA, which is incorporated herein by reference. While many such systems perform some degree of consolidating memory resources, they generally use simple, established techniques to unify a set of distributed memory resources into a single common pool. They provide little or no differentiation between dissimilar resource characteristics, and provide little or no application- or data-specific optimizations with regard to performance. Put simply, these related systems strive for the simple goal of aggregating distributed resources into the illusion of a single homogenous resource.
From a storage flexibility perspective, hardware components are generally virtualized in their entirety. A VM receives some number of virtual CPUs and some memory. It also receives one or more virtual disks. At the virtualization layer, this virtual disk is generally thought of as a single file, and stored in a well-known format such as Microsoft™'s Virtual Hard Disk (VHD) or VMware™'s VMDK. The contents of this file are that of an entire virtual disk. It contains a file system, an operating system (OS), one or more applications, and one or more data files. To the virtualization layer, however, the file is generally treated as a single cohesive unit that cannot be broken apart. One reason for this is that while an operating system is running, it makes assumptions that it is the only entity that is reading and writing to its disk. This assumption allows the OS to cache the file system state in memory and avoid reading the disk on every single access. If a third party were to try to read that disk while the VM was running, it would appear slightly older than the version that the VM sees, and if they were to write to the disk, they would violate the OS assumptions and would likely corrupt its contents. The inability to work at a sub-image granularity limits functionality.
Software from Softricity™ and Thinapp™ has looked at managing application deployment using file-level techniques that attempt to work at lower levels of granularity. For example, Moka5™ has developed techniques to decide which files to overwrite or persist in upgrading virtual machine-based appliances. These systems focus dominantly on the problem of upgrading underlying OS and application software, while preserving modifications and customizations that users of the system have made over time. Other examples include synchronization services such as Dropbox™, SugarSync™, and Mozy™, which provide solutions that replicate a subset of files from a notebook or desktop computer to cloud-based storage. However, none of these solutions allow system-based policy establishment about what and where to replicate/place data to be set at an organization-wide granularity. Moreover, prior systems have been limited in providing user-facing access to data management for virtualized memory systems. Some related attempts include NetApp™, which exposes a “.snapshot” folder in which users can access backups of their data on the NFS or CIFS filer. Employs technology related to allowing users to access the contents of virtual machines through a third-party interface, such as described in U.S. patent application Ser. Nos. 12/694,358, 12/694,368 and 12/694,383, each of which is incorporated herein by reference.
Managing the storage of data (documents, databases, email, and system images such as operating system and application files) is generally a complex and fragmented problem in business environments today. While a large number of products exist to manage data storage, they tend to take piecewise solutions at individual points across many layers of software and hardware systems. The solutions presented by enterprise storage systems, block devices or entire file system name spaces, are too coarsely grained to allow the management of specific types of data (e.g. “All office documents should be stored on a reliable, high-performance, storage device irrespective of what computer they are accessed from”). It is difficult or impossible to specify other fine-grained (i.e. per-file) policy describing the encryption, durability, or performance properties of data.
In some exemplary prior art systems, on physical computing devices, an operating system will generally use a file system, such as NTFS or VMFS, to permit the physical computing device to access and write files and directories to physical memory resources. For a virtual computing device, in general, a physical computing device will have operating on it a virtual machine monitor, sometimes also known as a hypervisor, such as VMware™ ESX™, Citrix™ XenServer™ or Microsoft™ Hyper-V™, which creates an instance of the virtual computing device on the physical computing device and manages communication from the virtual computing device to the associated virtual memory component. On current systems in general, the virtual memory component is instantiated from the physical memory component on the physical computing device on which the virtual computing device is running and a virtual memory file system (VMFS) is created on the virtual memory component by the virtual machine monitor (in some cases, the virtual machine monitor may include VMware™). In general, the user accesses the virtual machine monitor through a browser, for example, and the virtual machine monitor virtualizes some of the physical memory resources as storage memory, presenting some or all of the virtual memory resources available from, for example, a physical hard disk, as a virtual disk to the virtual computing device. The virtual machine monitor then takes instructions requests issued by the virtual computing device for the virtual disk and translates the instructions (e.g. read/write/update) from the virtual computing device for the virtual memory component, and then from the virtual memory component to the physical memory component. As virtual computing devices may move from one physical computing device to another as they run, it is common practice to use central shared storage rather than local disks on individual physical computing devices. In this common approach, a virtualization deployment will include a set of physical computing devices, all of which are connected to one or more storage arrays, which may be thought of as providing a single and shared physical memory component for all coupled physical computing devices. Typically, a local disk (i.e. local physical memory resources) is used with VMware™-instantiated virtual machines, which are actually restricted to running on the host machine. It is much more common with VMware™ to use a central, shared storage device (a “LUN” in enterprise storage terminology), which appears to be a single local disk that is connected to all physical machines in the cluster. VMware™'s new Virtual Storage Appliance allows local disks to be used for cluster sizes of two (2) or three (3) memory provisioning modules. It does this by pair wise mirroring entire physical disks between two physical computers.
The examples and objectives described above are included solely to advance the understanding of the subject matter described herein and are not intended in any way to limit the invention to aspects that are in accordance with the examples or improvements described above.