Virtual machine technology has changed the modern world by making all manner of computer resources more available, more efficient, more affordable, and more flexible. No longer are computational tasks tied to single, fixed, physical “boxes”. Indeed, by implementing a “computer” essentially wholly as a software construct, that is, as a virtual machine (VM), a user may not even know where actual data storage and processing are taking place as he runs a given application. Virtualization is at the heart of this revolution.
Even the virtual world must, however, ultimately run on at least one physical processing platform somewhere. Consequently, even a system of VMs is constrained by well-known physical realities. For example, the server on which VMs may be running might need to be halted or even powered off to allow for maintenance or upgrades. As another example, one server's workload may become so great compared to another's that there is a need for load balancing so as to improve overall performance by more efficiently allocating the physical computing resources.
One of the key advantages of virtualization is the ease of management and the ability to do such maintenance, load balancing, etc., with minimal downtime, and one of the primary tools to accomplish many of these tasks is “live migration”. As the name implies, “migrating” a VM involves moving it, at least functionally, from one physical host to another. One of the earliest successful techniques for migrating VMs is described in U.S. Pat. No. 7,484,208 (Nelson), which not only enabled migration of a VM from a source to a destination platform, but did so while the source VM was still running, thereby reducing the downtime experienced by the user usually to an unnoticeable level.
Live VM migration has thus been around for a decade and has naturally evolved, for example, from host to storage to “shared nothing” migration. So far, migration has been limited to migrating a single VM; however, nowadays some users run a variety of applications, tiers, clusters, etc., that involve more than one VM simultaneously, and even in other cases there is a need to be able to migrate not only one, but a set of VMs, while still keeping downtime as low as possible.
Existing approaches for migrating a group of VMs can be classified into two general types: parallel and sequential. For parallel migration, a set of VMs is started at the same time. The migrations may or may not complete at the same time, depending on VM memory size, memory dirty rate (see below) and network bandwidth. For sequential migration, a set of VMs is queued and executed one by one, such that the VMs switch-over execution to the destination at different times.
Conventional parallel and sequential migration both suffer from the shortcoming that migration failures may result in a split state of the VM group. In this context, group state is “split” when at least one VM in the group is running on the source platform while the remaining VM(s) are running on the destination platform. Split state may be undesirable in cases of applications whose execution spans multiple VMs. One example of such an application is a tiered application, with a backend or database layer, possibly a middle processing tier, and a frontend or web tier.
Another example of an instance in which it is disadvantageous to split the state of a set of VMs is where a VM-based system implements disk storage using virtual volumes that are exposed and managed by the underlying storage as logical unit numbers (LUNs) rather than just being files on in a VM the system. In this case, group consistency is important. In general, volumes can be added to a consistency group, which makes it possible to perform such operations as creating a snapshot, and replication can be performed on a group instead of individual volumes. Group level operations provide easier management and are sometimes more efficient. In case of virtual volumes, each volume is typically a VM disk, which is then to be moved. A set of a VM's disks (that is, a virtual volume consistency group) can be migrated to a different datastore. Failure to move one of the VM's disks may thus result in a violation of some of the group properties. Some systems, such as the Storage vMotion (SvMotion) feature provided by VMware, Inc., of Palo Alto, Calif., are provided to indicate VM migration failure if any of the disks of a single VM fails to migrate, but in the case where a consistency group comprises of a set of volumes which belong to different VMs, group migration can help prevent split state and preserve the consistency group.
Still another example is a cluster application. For example, some databases can run as a cluster of machines, which closely communicate with each other. When such a clustered database runs on multiple virtual machines and they are migrated, failure to migrate one of the VMs may result in split state. When there is split state in a long-distance migration, communication time between some of the machines typically increases. Again, group migration may be used to prevent this situation.
In still other cases, splitting the state of a clustered application in a VM group may in some cases violate such VM properties such as affinity, possibly resulting in degraded application performance due to communication latency between two data centers. In this case, group migration may help to maintain application performance.
It is therefore generally desirable to be able to migrate a group of VMs with as little disruption and delay of execution as possible.