1. Technical Field
The present disclosure relates to a virtualized environment for a data processing system, and more particularly to resource recovery on a secondary host machine in the event of a failure condition of a virtual machine operating on a primary host machine.
2. Description of the Related Art
Checkpoint-based high-availability is a technique whereby a virtual machine running on a host machine (the “Primary host”) regularly (e.g., every 25 ms) mirrors its Central Processing Unit (CPU) and memory state onto another host machine (the “Secondary Host”). This mirroring process involves: 1. tracking changes to the memory and processor state of the virtual machine; 2. periodically stopping the virtual machine; 3. sending these changes over a network to the secondary host; 4. waiting for the secondary host to acknowledge receipt of the memory and CPU state update; and 5. resuming the virtual machine.
The mirroring process ensures that the secondary host is able to resume the workload with no loss of service should the primary host suffer a sudden hardware failure. If the secondary host either notices that the primary host is not responding, or receives an explicit notification from the primary host, the secondary host starts the mirrored version of the virtual machine. The effect to the outside world is that the virtual machine has seamlessly continued to execute across the failure of the primary host.
One of the key performance bottlenecks in this process is the rate at which pages of modified memory must be transferred from the primary host to the secondary host during execution. In all implementations of this technology today, modifications to memory can only be detected with page-level granularity, which is at least 4 Kbits. The hypervisor achieves this by marking all memory used by the virtual server as read-only following every checkpoint, and detecting the faults that occur when the virtual server attempts to write to a page of memory. The hypervisor can then record that the page has been modified and must therefore be transferred at the next checkpoint. Then the hypervisor can remove the write-protection so that future writes to that page do not cause a fault. At the next checkpoint, the memory is re-protected and the list of modified pages cleared.
The cost of this approach is therefore at least twofold. First, the first write to any page in a given checkpoint interval (the space between two checkpoints) causes a fault that must be handled before the workload can resume. Second, the page must be transferred to the secondary host, which consumes network bandwidth.