A “physical machine to virtual machine converter for disaster recovery” module (“P2V-DR” module) such as a VMware vCenter Converter software module that is available from VMware, Inc. of Palo Alto, Calif. converts a physical machine to a virtual machine and creates and stores an image of the created virtual machine (“VM”). As such, a P2V-DR module extends disaster recovery strategies to physical machines in a data center by allowing images of physical machines to be archived at remote sites, i.e., the P2V-DR module enables a user to use a VM as a backup for a physical system. An advantage of having a backup VM is an effectively zero recovery time since, in case of a disaster, the user only needs to power up the backup VM to recover. Also, with the backup being a VM, one can maintain multiple backup images (for example, as VM snapshots), and even test the backup VM by powering it up in an isolated virtual network.
A P2V-DR module typically starts with a full “physical machine to virtual machine” (“P2V”) conversion. After that, the P2V-DR module creates incremental backups, for example, on a periodic basis, to transfer ongoing changes from the source system to a target VM. The P2V-DR module typically uses block-level write tracking to determine which blocks were changed between backup cycles.
In another example, a P2V conversion is usually performed by taking a snapshot of the source system, and then cloning the snapshot to the target VM. Effectively, this means that the target VM represents the state of the source system at the moment the P2V conversion process started. If there were any changes in the source system during the cloning process, these changes would be missing from the target VM. Consequently, a user must take the source system out of production before migrating it to a target VM to avoid losing any data. For a typical server, a full P2V conversion could take many hours. Thus, the user has to schedule at least a day of server downtime. But, using a P2V Motion software module, for example a P2V Motion software module contained in the VMware vCenter Converter software module that is available from VMware, Inc. of Palo Alto, Calif., this downtime (i.e., the time between ending use of the source system to generate a target VM and completing the VM generation process) can be shortened to a few minutes.
The P2V Motion software module starts block-level write tracking on the source system before starting a full P2V conversion. After that, the P2V Motion software module proceeds with the full P2V conversion. Once the full P2V conversion is completed, the P2V Motion software module performs a “catch-up” phase. In the catch-up phase, the P2V Motion software module uses tracking information provided by a block-level write tracking driver, and transfers all changed blocks to the target VM. Thus, the duration of the downtime becomes equal to the duration of the catch-up phase.
Windows
Block-level write tracking is usually implemented in a Microsoft Windows™ operating system as follows (referred to herein as “Windows”). Windows divides all devices into classes, and each device class has a globally unique identifier (“GUID”). For each device class GUID, there is a place in the Windows registry where Windows maintains a list of upper (and lower) filter drivers for this device class. If a driver wants to filter requests to a particular device class, it needs to add itself to the list of upper or lower filter drivers for this device class.
When Windows discovers a new device, it first asks the device driver to create a device object (called a functional device object) for the new device. After that, Windows enumerates all filter drivers registered for the device object class, and asks each filter driver to create a filter device object for the new device. As is known, a filter driver is a driver/program/module that inserts a filter device object into a device stack to perform some specific function. Any number of filter drivers can be added to Windows—upper level filter drivers sit above the primary driver for the device (the functional driver), while lower level filter drivers sit below the functional driver and above the bus driver. Eventually, all device objects form a device stack, refer to FIG. 1. Note that the order in which filter device objects are positioned in the device stack is not specified nor guaranteed by Windows.
As is known, whenever an operation is performed on a device, Windows passes an I/O request packet (“IRP”) data structure to a driver corresponding to the top device object in the device stack. Each driver either handles the IRP or passes it to the driver that is associated with the next-lower device object in the device stack. Once the IRP reaches the functional device object, the corresponding device driver carries out the requested command, and marks the IRP as completed. Once the IRP is marked as completed, Windows walks the device stack from the bottom to the top and notifies each filter driver that the IRP has been completed.
Usually, a write tracking driver (i.e., a driver that tracks block-level writes to volumes in the system) needs to register itself as an upper filter driver (referred to as a write tracking filter driver) to a device class called “Generic Storage Volume.” Then, when a new volume appears in the system, Windows calls the write tracking driver to create a corresponding write tracking filter device object. Then, the write tracking driver can monitor all requests coming through this write tracking filter device object and perform special processing for write requests, for example, by maintaining a bitmap of blocks that have been changed since a particular moment in time (for example, such a write tracking driver may be referred to herein as a bitmap driver).
Using the above-described “upper filter approach” in Windows requires the source system to be rebooted for the write tracking driver to start tracking block-level writes. Because a reboot cycle for a production server carrying a heavy load could take a long time (resulting in a long downtime period), finding a time slot for such a long downtime period could be problematic for such a production server or for a production server responsible for mission critical applications. In particular, in the case where a P2V-DR module is used, no backup protection could be configured until the source system was rebooted. Since downtime is usually scheduled during the night, but setting up to run the P2V-DR module on the source system is usually done during the day, a long downtime due to a need to reboot could delay the availability of a fully functional backup VM for several days. This could be costly if a disaster happens. In the case of use of a P2V Motion software module, any server downtime is usually unacceptable because the main reason for the user to use the P2V Motion software module in the first place is to avoid server downtime.
Tracking writes in Windows may be problematic for another reason. In particular, Windows provides a standard mechanism for taking a snapshot of a running system that is provided by a Volume Shadow Copy Service (“VSS”). On one hand, VSS provides a generic snapshotting API, and on the other hand, it provides a callback API (called writer API) for any database (or any other application maintaining open files) to participate in the snapshot creation process and to create a consistent image of the system.
In Windows XP and Windows 2003, VSS requires all writers (i.e., applications that want to be aware of the snapshot—typically database applications) to flush data directly to the volumes for which a snapshot will be created (“working volumes”) just before VSS creates the snapshot. This enables the writers to make sure the “working volume” has the data the writers want to be on the “working volume” before the snapshot is created. Then, once the snapshot is created, VSS allows the writers to continue accessing the “working volume.” Since it may take up to several minutes to create a snapshot, the availability of a server could be negatively affected while the snapshot is being created. However, starting with Windows Vista (i.e., Windows Vista, Windows 2008 and Windows 7), VSS allows writers to write directly to the snapshot without flushing to the “working volume” itself. This enables VSS to create a snapshot much faster and without affecting the server's availability.
However, a problem with using VSS is that “directly-to-the-snapshot” writes occur only while the snapshot is being created and before it is made available to an application requesting the snapshot (typically an application performing a backup). If the backup application uses a write tracking filter driver attached to the “working volume” to track block-level writes, the write tracking filter driver will miss snapshot writes. Consequently, if the backup application were to copy from the snapshot only those blocks that were reported by the write tracking filter driver, the backup image would be incomplete and inconsistent. Unfortunately, VSS creates snapshots in an unusual way in that snapshot volumes are not considered regular storage volumes and thus cannot have a filter driver attached to them.
Linux
Block-level write tracking is usually implemented in the Linux operating system (referred to herein as “Linux”) by adding a block-level filter driver through a “Device Mapper.” The Device Mapper is a kernel component which could transform a block I/O (“BIO”) request based on different policies. By transforming a BIO request, a Device Mapper device could remap the request to a different address, to a different block device, or simply perform some bookkeeping task and then pass the request to the underlying device. A Device Mapper device is itself a block device, and stacking them is allowed, refer to FIG. 2.
To track writes to a block device in Linux, one can instantiate a write tracking Device Mapper device stacked on top of the block device. Then, whenever the write tracking Device Mapper device receives a BIO write request, it tracks the write, for example, with a bitmap of changed blocks and, after that, it passes the BIO write request down to the underlying block device. In addition, to track writes to a block device, Linux needs to be told to access the write tracking device instead of the block device itself. Thus, if a file system is already mounted on the block device, it would have to be remounted to the write tracking device.
At boot time, Linux selects the block devices on which its file systems mount by looking at a File System Table (fstab). By changing the block device on which a specific file system mounts to a write tracking device in fstab, write tracking will occur after the next reboot.
Thus, modifying the fstab in Linux requires the source system to be rebooted for a write tracking driver to start tracking block-level writes. This is problematic for the same reasons discussed above with respect to Windows.
Tracking writes in Linux may be problematic for another reason. In particular, tracking writes to the boot volume in Linux may occur as follows. When a Linux operating system is started, a boot loader (usually GRUB) is responsible for loading an image of the Linux kernel together with the core drivers from the boot volume into memory. Consequently, the boot loader must have a built-in driver to access a block device with the boot volume. If this block device were provided by a write tracking Device Mapper device, the boot loader would not know how to access the boot device because GRUB would not know about the write tracking Device Mapper device. GRUB has built-in support for IDE and SCSI block devices. Thus, short of writing one's own boot loader, there is no way to add another driver to GRUB to access a custom filter block device. As a result of this limitation, it is impossible for a backup application to track block-level writes to the boot volume. The situation is aggravated by the fact that in some Linux distributions the boot volume is combined with the root volume which contains the entire Linux installation and most of the applications.
To solve this problem, some backup applications require the boot volume to be separated from the root volume. However, this requires careful advanced planning and may prevent a user from using the backup application on an existing server without reinstalling the entire system. Other backup applications choose to back up the boot volume non-incrementally during each backup cycle. However, if the boot volume is combined with the root volume, this could result in large amounts of data being backed up unnecessarily during each backup cycle.