There are conflicting demands for storage associated with a specific Virtual Machine (VM). Directly-attached storage (DAS) provides the highest performance. But providing all of the local storage associated for each VM assigned to a host machine may not be an effective solution. Generally, this local storage will be in addition to copies maintained in a network file system. Relying on local storage alone is typically not an option due to its limited capacity, the need to support high availability, archiving and disaster recovery, the benefits of using centralized storage management and powerful NAS management tools. Files already stored in a network file system are already available over the network to support executing VMs, although with longer latencies and lesser throughput. Therefore it is very desirable to only use local storage for the subset of the VM-associated storage that will truly benefit from the optimized local access speeds. Additionally, when the VM is subject to migration, particularly live migration, the cost of replicating local storage can become excessive.
Finding a workable solution to this tradeoff is critical to supporting VMs with a solution that can scale to the degree that the market demands.
One of the drivers for this increased need to scale deployments is the ability of Hypervisors and Virtualization to perform “live migration” of VMs from one host to another. These capabilities are featured in solutions from VMware, Xen, KVM and Microsoft Hyper-V. The motivations for live migration vary from the desire to provide a fully dynamic system where processors and bandwidth are allocated on demand to merely allowing easy re-balancing of loads as the processing and/or bandwidth requirements of servers changes over time.
Specifically, live migration enables copying the memory image of a VM from one host machine to a new one without requiring the VM to be shut down. A process of copying the memory image of the VM is initiated from the current host to the target host. This process continues while the VM still executes on the current host. Memory pages that are modified after they are copied are copied again. This process continues until the new image is sufficiently copied so that the new image is ready to begin executing, and a cutover of the L2 and L3 network identifiers is made and the new image continues the work that the original image had been conducting. Some hypervisors may do the cutover before the full image is copied and rely upon page faults at the new location to pull un-migrated pages on demand.
An L3 address is a layer three address, such as an Internet Protocol address or an InfiniBand GID. An L2 address is a layer two address, such as an Ethernet MAC address or an InfiniBand LID.
Live migration enables assigning resources to VMs on a dynamic basis. These resources include server Virtual CPUs, network, storage and RAM. Conventional solutions require trading off between using shared storage (i.e. using a network for NAS or SAN access) and the need to copy the image between unshared/direct storage in parallel with the migration of the VM itself.
The same techniques used to migrate the memory image of the VM can be employed to migrate the supporting storage as well. These techniques can enable the final cutover from the old host to the new host to be fast enough to support live migration, but the total time required to support a migration requires reserving resources at both the old and new locations is increased. Extended double booking of resources imposes a cost that limits the scaling of clusters.
One common solution is to use network storage (NAS or SAN) to store the virtual disk image on a networked server. The essential migration required is to transfer the access rights for the virtual disk image from the source host to the destination host. This solution is workable, but reduces the performance of disk access from DAS speeds to network (NAS or SAN) speeds.
In addition to live migration, hypervisors create snapshot images of VMs that can be used to restart the VM from the point where the snapshot was taken after a failure of the VM or host.
In the target environment, creating snapshot files using local storage and then automatically replicating those files to NAS storage would drastically shorten the time required to create a VM snapshot. The reduced round-trip times alone can substantially improve performance. Even greater improvements can be further achieved by using Solid State Drives (SSDs) for local storage.
There are several features of NAS protocols that are intended to support both a uniform name space implemented by multiple file servers and to enable migration of files and directories from one server to another. Primary examples of these protocols would be Microsoft's Distributed File System (DFS) and NFSv4 and NFSv4.1.
One of these capabilities is the ability of a NAS server to refer a directory to another NAS server. NFSv4 offers this capability as referrals. Microsoft's Distributed File System (DFS) offers it as redirections. With referrals/redirections a central server can refer clients to other servers at specific mount points. This provides centralized control over client mounts using the NAS protocol itself.
NAS protocols also support maintenance of additional read-only copies of a file system on multiple servers. Clients needing only to read files can choose to access the replicated copies rather than the master copy. While the method of replication is generally not standardized there are well-known methods of implementing file-based replication using only standard NAS capabilities, as well as additional implementation-dependent methods of replicating when the two file systems have knowledge of each other's internal data structures.
There are multiple reasons for supporting a federated collection of file servers in a single global name space. The basic ability to place subdirectories on different servers without incurring changes on the client side provides for scalability, ease of management, capability to support user mobility, and other benefits well known in the art.
NAS protocols supporting Federated file systems also allow clients to claim exclusive ownership of a file, or even a directory, and cache the updates locally. Claiming exclusive ownership of a file or directory grants a NAS client the ability to exclude access by other users from interfering with optimizing local caching.
One of ordinary skill in the art will recognize that a file system can qualify as a clustered or parallel file system and still meet this definition of a Federated File System (Federated FS), although they would typically be marketed with the former labels.
NAS proxy servers are well known conventional elements where a server accepts requests from clients configured to use it, but may resolve those requests by accessing other network file servers. The NAS proxy server generally optimizes performance to its clients by reducing the number of network interactions required over longer-haul connections and/or by caching some of the files, or portions thereof, locally.
FIG. 1 shows a conventional deployment 100 using NAS proxy server 104. In one deployment all access to the network files is through the NAS proxy server 104, and the NAS proxy server 104 may combine name spaces presented by multiple network file servers into a single global names space. With a Federated FS 108, the file servers 114, 116 already have a common global namespace and may be connected to the NAS clients 106 directly. Referral to a NAS proxy server 104 is done when a directory is sufficiently migrated to the NAS proxy server 104 that overall access will be optimized by using the alternate server.
In either case, the NAS proxy server 104 will be servicing a larger portion of the global namespace than is optimal. In the target environment, a method is sought to provide the benefits of a NAS proxy server 104 while avoiding the overhead of using a proxy layer when no local resources have been allocated for the specific files or directories.
The conventional use of a file system NAS proxy server 104 has further undesirable effects in the target environment. A migrated VM must be explicitly re-directed from the NAS proxy server 104 associated with the prior location to the NAS proxy server 104 associated with the new location. This will require the VM to at least temporarily access the old NAS proxy server 104 while already at the new location, or to temporarily cease use of any proxy at all. A more desirable solution would migrate the NAS proxy server service in a manner that was transparent to the VM.
One shortcoming of NAS proxy servers 104 is that they add an extra step to the process of resolving a client's request. A NAS proxy server 104 must provide optimized service for a large enough subset of the requests it handles to justify the extra step of using a NAS proxy server 104.
A preferable solution would allow one or more NAS clients 106 to access the network file system directly when the local NAS proxy server 104 would be providing no optimization for a given directory.
Another possible solution would be to create a VM that acts as the NAS proxy server 104 for a specific VM. This dedicated VM would be a shadow of the VM it serviced; they would always be migrated in tandem. The two VMs would be configured to connect on a private port group or VLAN on each host that hosted them.
This dedicated local NAS proxy VM would employ conventional NAS proxy server techniques to serve up the global name space to its VM, while exercising control over which portions of the files were actually stored locally.
Creating a shadow VM to provide service to an application VM is a well-known virtualization technique that has been deployed to provide scalable hardware emulation. The limitations of this solution include the overhead of creating an additional service VM for each application VM required, and the fact that as isolated VMs the Local NAS proxy servers will be unable to find optimizations across their VM clients. For example, multiple Linux VMs will typically share many of the same files on their install partition. Having each Local NAS VM deal with only a single client effectively blocks the potential for de-duplication savings.
Another shortcoming of conventional solutions is the lack of integration between Virtualization Management and NAS/SAN Management. For example, information on the total load on local storage is not factored into the load balancing decisions made by Virtualization Management. In current solutions, only the resources directly controlled by the Hypervisor are factored in choosing where to deploy VMs. Further, NAS/SAN Management receives no notification on VM migrations and must infer when the network topology has changed. A better solution for the targeted environment would provide integration of these Management Planes.
Current solutions for supporting storage for VMs do not scale well because they rely on either shared storage, with the overhead associated, or on directly-attached storage (DAS). To be effective, DAS storage may prove to be ineffective because of the costs of providing adequate local storage that is redundant with network storage and/or because of the time durations required for complete migrations.
Attempts to address these problems using NAS proxy servers alone impose the cost of working through the proxy layer at all times, even for directories and files that are not being optimized with local storage.
Conventional solutions provide no integration of management of NAS referrals with the state of VMs or the serial pairing of VMs with a single Hypervisor Platform. The Federated FS is managed as though any Hypervisor Platform was as likely to access any Virtual Disk image in the pool as any other hypervisor, or in fact any other Client. NAS Management has only actual usage of files to guide it when selecting the optimum location for specific files. Virtualization makes this problem even more challenging for NAS Management by maintaining constant L3 and L2 addresses for migrating VMs.
There is also no optimization for the exclusive access patterns for directories associated with VMs. NAS Management has only actual usage of files to guide it when selecting the optimum location for specific files.
The present invention provides a method and a system to address all these issues.