In the field of computer hardware and software technology, a virtual machine is a software implementation of a machine (computer) that executes program instructions like a real machine. Virtual machine technology allows for the sharing of, between multiple virtual machines, the physical resources underlying the virtual machines.
In virtual machine environments, storage volumes within the virtual machines contain data items that need to be accessed. For example, word processing documents may be stored on a virtual drive within the virtual machine, which itself is stored on a physical disk, often times as a flat file or group of files. Unfortunately, accessing the underlying contents of a storage volume can be very resource intensive as it requires navigating both physical and virtual layers.
In addition to complications associated with access, moving or copying virtual machines can also be very resource intensive. For instance, backup or replication processes may require substantial amounts of bandwidth to effectively copy or transfer virtual machines or their sub-components. This is because the file or files that comprise the virtual machine tend to be very large since they contain not only user data, such as word processing documents, but also operating components, such as operating system files, application files, and the like.
These challenges not only limit the wide-spread adoption of virtual machine technologies, but also impede the introduction and integration of other features with backup and replication processes, such as virus scanning, content protection, and other tools that would be useful within virtual machine environments.
Overview
Software, systems, and methods described herein provide for improved backup and replication within virtual machine environments. In particular, embodiments disclosed below allow for the integration of ancillary processes, such as virus scanning and malware detection, with transfer processes (backup and replication). The resulting improvements reduce the time and bandwidth required for backup and replication, while also enhancing their utility.
In an embodiment, a non-transitory computer-readable medium has stored thereon program instructions for updating a replica of a target storage volume associated with a plurality of data blocks on an underlying storage volume. The program instructions, when executed by a data control system, direct the data control system to identify a first group of data blocks of the plurality of data blocks on the underlying storage volume that have changed, identify a second group of data blocks of the first group of data blocks that are live, identify changed data items associated with the second group of data blocks, initiate an ancillary process on the changed data items, and initiate an update of the replica of the target storage volume with the second group of data blocks.
In another embodiment, a data control system comprises an interface and a processing system. The interface is configured to receive an instruction to update the replica. The processing system is configured to identify a first group of data blocks of the plurality of data blocks on the underlying storage volume that have changed, identify a second group of data blocks of the first group of data blocks that are live, identify changed data items associated with the second group of data blocks, initiate an ancillary process on the changed data items, and initiate an update of the replica of the target storage volume with the second group of data blocks.
In another embodiment, a method of updating a replica of a target storage volume associated with a plurality of data blocks on an underlying storage volume comprises identifying a first group of data blocks of the plurality of data blocks on the underlying storage volume that have changed, identifying a second group of data blocks of the first group of data blocks that are live, identifying changed data items associated with the second group of data blocks, initiating an ancillary process on the changed data items, and initiating an update of the replica of the target storage volume with the second group of data blocks.
In another embodiment, the ancillary process comprises a virus scan, and wherein the update of the replica is stopped if the results indicate that that at least one of the changed data items is not clean.
In another embodiment, the ancillary process comprises a content check, and wherein the update of the replica is stopped if the results indicate that that at least one of the data items includes content that should not be included in the update according to a content policy.
In another embodiment, the second group of data blocks comprises a snapshot of the target storage volume and the replica of the target storage volume is returned to a previous state in response to results of the ancillary process.
In another embodiment, the update of the replica comprises a transfer of the second group of data blocks to a replica virtual machine environment.
In another embodiment, the update of the replica of the target storage volume includes a transfer of the second group of data blocks to a second underlying storage volume and omission of data blocks not in the first group of data blocks or the second group of data blocks from the transfer.
In another embodiment, the underlying storage volume comprises a virtual disk file containing a virtual machine, and wherein the target storage volume comprises a virtual drive within the virtual machine.
In another embodiment, an inclusion of at least one of the changed data items in the update of the replica based on results of the ancillary process is prevented.