Virtualization is of growing importance in the world of IT deployments, in particular in the fields of data centers and wide-area server infrastructures.
In the data centre virtualization allows the consolidation of services in such a way as to provide the control and protection facilities normally associated with dedicated separate hardware and software environments without higher costs. As an example it may be wise to separate the publicly visible web server of a company from the internal intranet server hosting potentially valuable material. For security reasons these would often be deployed on separate physical machines to provide the appropriate protection environment, however, neither will use the full capacity of resources provided by the physical hardware. These two separate physical machines could be replaced by a single server running virtualization software such as Xen or VMWare in order to provide a protection mechanism to effectively and securely separate the two services within their own virtual machines. The Xen technology has been described for example by Barham, P. et al. in the publication “Xen and the art of virtualization”, SOSP (2003).
The sharing of a single physical host is also helpful in reducing the cost of deploying services. Both in terms of the saving in the initial cost of deployment by reducing the costs of hardware investment but also in the longer term by reducing the costs of hardware maintenance.
By removing the tight coupling between the physical machine and the services that are running on it through the use of virtualization it paves the way to allowing the movement of live services, enveloped within a virtual machine, from one physical host to another, without severely disrupting them. This technology is virtual machine migration. Virtual machine migration is very useful in supporting for example load balancing, allowing virtual machines to be moved from more to less busy servers, failover and dynamic relocation of latency-sensitive services. Furthermore, it can be used to maintain a high uptime for a service by allowing it to be migrated to a different physical machine if the original host must be taken down for maintenance. Similarly, migration can also be used to allow for the transparent upgrade of the physical host.
The majority of virtual machine migration technologies rely on shared storage between the source and destination hosts. Such shared storage may be provided by using specialist hardware and networking in a Storage Area Network (SAN) or using a network file system. One disadvantage of these is the higher cost of the extra hardware needed to provide for this. They are also not suitable for situations where the performance or privacy available from local storage is necessary. Additionally, they do not enable migration of virtual machines in the wide area, where shared storage is generally not available.
In the wide-area virtual machine migration is also useful since it allows moving latency intolerant services, such as interactive game servers, closer—in terms of network distance—to the clients. This reduces the round-trip latency and hence improves the quality of the game for the players. In order to ensure that the game environment is fair the server should be migrated to a position as equidistant from each of the players as possible. Furthermore, wide-area migration is useful for load balancing across multiple data centers, for instance to balance demand peaks generated during certain times of the days.
U.S. patent application No. 2006/0005189 A1 describes Systems and Methods for Voluntary Migration of a Virtual Machine between Hosts with Common Storage Connectivity. The known system allows migrating virtual machines between servers that are connected to an external storage device such as a network-attached storage (NAS) server. Therefore, the persistent state, i.e. the file system used by the virtual machine to be migrated does not need to be migrated. The system only migrates the device information (i.e. updates the relevant operating system structures on the destination server to allow connection to the common storage after the VM is migrated).
Further known approaches used to ensure that a migrated virtual machine does not use any local storage, and that both the source and the destination server can access the virtual machine's file system by connecting to a remote file server are described, for example, by Nelson, M. et al. in the paper “Fast transparent migration for virtual machines”, published in USENIX 2005 (2005), by Osman, S. et al. in the paper “The design and implementation of Zap: a system for migrating computing environments”, published in SIGOPS Oper. Syst. Rev. (2002) and by Hansen, J. G. et al. in the paper “Self-migration of operating systems”, published in EW11: ACM SIGOPS European workshop (2004).
For example, network file system (NFS) is a commonly used protocol for implementing such systems using remote storages in the local area. Distributed file systems such as Coda, the Andrew File System, the Common Internet File System, and the Global File System allow using remote file servers even in the wide area. Peer to peer file systems such as xFS and Farsite allow virtual machines to use a set of unknown remote peers for storing their persistent state.
However, the I/O performance of such systems is far lower than that of local disks, thus making them unsuitable for a large number of applications and types of usage—for instance, for databases or storing swap files. At the same time, the storage of virtual machine data on untrusted peers may be incompatible with the trust and privacy requirements of commercial services using the virtual machines. Additionally, all above solutions require that custom file system software is installed and used by the migrated virtual machine, which introduces administration overhead.
Furthermore in platforms such as XenoServers and Grids, migration is often used to bring services closer to their clients, in order to reduce round-trip latency. The Grid platform is described by Foster, I. et al. in the paper “The anatomy of the Grid: Enabling scalable virtual organization”, published in Int'l Journal of High Performance Computing Applications (2001).
A further approach called State capsules uses remote storage devices and an on-demand fetching process. The State capsules is described by Sapuntzakis, C. et al. in the paper “Optimizing the Migration of Virtual Computers”, published in OSDI (2002). This approach enables to migrate the memory of a virtual machine first, then fetch any data needed from the source host on-demand. Network Block Device and iSCSI allow clients to access remote storage devices as if they were local. This can be used to allowing the migrated virtual machine at the destination host to access its file system exported by the source host over the network.
This technique enables a migrated virtual machine to start running on the destination host before its entire file system is transferred to the destination. However, this requires that the source host remains available and accessible for a substantial amount of time after the migration, in order to keep hosting the file system of the migrated virtual machine until it has been transferred to the destination host. This results in residual dependency problems: such systems are vulnerable to unpredictable source host unavailability, for instance in the case of a network disconnection or power failure. In effect this halves the expected time until a host the virtual machine depends on fails. At the same time, requiring the cooperation of source hosts conflicts with the federated, disconnected nature of platforms such as XenoServers, Grids, and PlanetLab, where servers are under the administrative control of different organizations and may be unpredictably shut down.
U.S. patent application No. 2005/268298 describes a system, method and program to migrate a virtual machine.
The known system facilitates migration of virtual machines by copying their memory including their “block memory”, or data blocks that reside in a memory. If the virtual machine uses blocks that are not stored in the memory, they have to be loaded in memory first before migration can begin. This makes this system unsuitable for migrating virtual machines that use large amounts of local storage. Furthermore, this system does not perform live migration, i.e. the virtual machine has to be stopped at the source server while its data is being copied to the destination server.
European patent application No. 1 589 412 A2 describes a Computer data migration system which allows migrating of storage subsystems within a storage area network (SAN). As such, it requires that storage components use specialized storage area network hardware, and that a “Back End Server” is available, which controls the migration process and keeps track of where the migrated data resides. For this reason, this system is unsuitable for the end-to-end, unmediated migration of virtual machines. Furthermore, it does not support wide-area environments, where common storage access is not straightforward.
A simple way to allow the migration of a virtual machine that uses local storage is to freeze the virtual machine, copy its memory and persistent state, for instance using the scp command, and then start the virtual machine at the destination server. For example, the Internet Suspend/Resume approach allows suspending a personal computing environment and transporting it to another physical machine, where it is later resumed. This approach is described by Kozuch, M. et al. in WMCSA (2002).
However, this type of migration requires that the virtual machine be stopped before copying its persistent state to prevent file system consistency hazards. Thus, for file systems of a realistic size, this results in severe service interruption.