In a virtualized environment, high availability (HA) and migration enable a virtual machine running on a one host system to be resumed or restarted on another host system with minimal interruption to the service provided by the virtual machine. As a part of migration or fail over, one or more virtual devices associated with a virtual machine are moved from one host to another host. For example, the virtual hard drive for the virtual machine may be copied from source to destination while the virtual machine is still running on the source. The virtual machine is then stopped or suspended at the source and restarted or resumed at the destination. While the virtual machine is stopped or suspended at the source, device state and other data that had not been committed to the virtual hard drive are also be copied to the source (if available).
Virtual machines commonly utilize caching to improve input/output performance. Caches typically run in one of two modes, using write-through (WT) or write-back (WB) caching. In both modes, reads are cached in accordance with one of a number of algorithms (e.g., least recently used (LRU), adaptive replacement cache (ARC), CLOCK with Adaptive Replacement with Temporal filtering (CART), etc.). In WT caching, writes are written both to the cache and to the backing storage (e.g., for a virtual machine, to a corresponding virtual hard drive image, which may be implemented as a file in a file system on persistent storage media such as a storage area network (SAN) or network-attached storage (NAS)). The write is not returned as successful until both the write operation to the cache and the write operation to the backing storage succeed. As a result of the data being committed to the backing storage, a virtual machine utilizing a write through cache may be migrated or failed over to another host system without the cache and, therefore, without losing any data that was stored in the cache. In WB caching, however, a write is returned as successful when the write to the cache succeeds (i.e., without waiting for a write to backing storage). As a result, writes to a WB cache may be performed more quickly than writes to a WT cache. A subsequent flush operation writes the WB cache data to the backing storage. Data written to the WB cache but not yet flushed to the backing storage is referred to herein as “dirty data.” WB caches typically batch many writes into a single flush operation so a large amount of dirty data may be present in the cache. A large amount of dirty data, having not been committed to the virtual hard drive or other backing storage, slows down the migration/recovery of a virtual machine. For example, when the cache is not accessible for use by the destination host system (i.e., the host upon which the virtual machine has been restarted or resumed), the cache (or at least the dirty data) is to be copied to the destination host while virtual machine is stopped or suspended. The larger the amount of cached data that needs to be transferred, the longer the migration/recovery will take. Even when the cache is accessible to the destination host system, use of the original cache by the migrated/recovered virtual machine typically incurs the expense of slower access to the cache over the network.