Virtualization technology has been used to construct and run virtual computers (virtual machines) on a physical computer platform (physical machine). The computer virtualization allows flexible allocation of a physical machine's processor performance, memory space, and other resources to virtual machines, thus facilitating usage management of hardware resources.
In some cases, there may be a need for moving a virtual machine from its current host physical machine to a different physical machine. For example, some virtual machines may be relocated to a new physical machine when their current physical machine is under a heavy load and thus confronting a likely shortage of available hardware resources. Another case is when the current physical machine has to stop for the purpose of maintenance work or power saving. In this case, all existing virtual machines are moved to a different physical machine.
Live migration is one of the known methods of moving virtual machines between different physical machines. This particular method moves a running virtual machine without shutting down its operating system (OS) or application software, thus minimizing the substantial down-time of the virtual machine. For example, the process of live migration proceeds as follows.
First, the source physical machine copies data from its memory area to the destination physical machine in a page-by-page fashion. Here the term “page” refers to a unit memory area used by the moving virtual machine. As the virtual machine is still operating on the source physical machine, the existing data in pages may be modified by the virtual machine during the period of delivering a copy of all page data to the destination physical machine. If some pages encounter such modifications after their copy is transmitted, then it means that another copy of the modified pages (called “dirty pages”) has to be sent to the destination physical machine. The source physical machine recopies page data repetitively until the number of remaining dirty pages becomes sufficiently small.
The source physical machine then stops the virtual machine under migration, not to allow further page modification, and copies the remaining dirty pages to the destination physical machine. The source physical machine also sends the processor context, including the current values of program counter and other registers, to the destination. The destination physical machine loads the received page data in memory and restores the received processor context in the processor, thus permitting the virtual machine to resume its information processing operation. In other words, the destination physical machine takes over the stopped tasks from the source physical machine.
As an example of related art, there is proposed a fault tolerant server that runs a working virtual machine and a protection virtual machine. The memory space of the working virtual machine is divided into a first group of sub-areas and a second group of sub-areas. The proposed fault tolerant server temporarily stops the working virtual machine when a checkpoint is reached. During this temporary stop period, the fault tolerant server copies modified data in the first-group sub-areas to a transfer buffer, where the “modified data” denotes the portions that were modified after the previous checkpoint. Here, the fault tolerant server disables write operations in the second-group sub-areas and copies their modified data after the working virtual machine is released from the temporary stop. The fault tolerant server then transmits the data in the transfer buffer to the protection virtual machine.
Another example is a cloud system that provides a live migration capability for virtual machines. In this proposed cloud system, the source physical machine measures the modification rate of each page (i.e., how frequently the page data is modified). The cloud system copies page data from the source physical machine to the destination physical machine in ascending order of the modification rates. That is, the pages with less frequent modifications are copied earlier than pages with frequent modifications.
Yet another example is a computer system that allows selection of memory areas for use in the destination physical machine. According to this computer system, the source physical machine detects a memory area that has been modified by some programs and sends information about the detected memory area to the destination physical machine. The destination physical machine places this modified data in a memory area that provides the best access performance. See, for example, the following documents:
Japanese Laid-open Patent Publication No. 2014-178981
Japanese Laid-open Patent Publication No. 2014-191752
International Publication Pamphlet No. WO2016/013098
As previously described, the live migration includes the step of recopying modified memory data. Since a page is the minimum unit of recopying, even a small change of page data puts the page into the recopy queue. If some virtual machine repetitively performs such small-size write operations across distributed pages, the resulting increase in the page dirtying rate could disturb the execution of live migration.
The recopying of page data may be repeated until the remaining dirty pages become sufficiently few. This means, however, that the source physical machine keeps missing the chance of stopping the virtual machine, thus making it difficult to complete the live migration. While it may be possible to abandon the recopying for successful completion of live migration, this option would end up with a large number of remaining dirty pages, and the live migration process has thus to spend a long time to transmit memory data after the virtual machine is stopped. The substantial down-time of the virtual machine would be elongated due to the long transmission period, and the availability of the same is consequently degraded.