In virtual computing systems, one computer can remotely control another computer. Virtual computing often involves implementing virtual or paravirtual machines. A virtual machine (VM) is a software implementation of a machine (e.g., a computer) that executes programs like a physical machine. A paravirtual machine is similar to a VM except hardware drivers of a paravirtualized machine are optimized for hypervisor performance. For purposes of this invention, paravirtual and virtual machines are interchangeable.
United States Patent Application Publication Number 20100082922 entitled VIRTUAL MACHINE MIGRATION USING LOCAL STORAGE describes replication of operating/executing virtual machines so that execution by a machine at a given snapshot of time on one physical server may continue from the same step on a replica machine on a different physical server. It implicitly presumes the same network context/subnet for the machine copy locations. The intent of this replication is to migrate virtual machines among servers in a single datacenter. The publication does not recognize that the approach is unworkable when time delays, bandwidth limits, and network reliability between machine copies occur, or that the virtual machine copying and migration process has inherent reliability, security, and reachability problems that must be overcome to make a workable solution when copying occurs at two separate physical or network locations.
In virtual machine systems it is often desirable to provide for live file transfer between machines. United States Patent Application Publication Number 20080033902 describes a system and method for transferring operating systems, applications and data between a source machine and a target machine while the source machine is running. Attempting to do this file transfer so introduces the problem of attempting to transfer files that may be in use or “live”, as such they will be locked by another process during the transfer. Publication Number 20080033902 addresses the problem of transferring locked files and ensuring the most current version is transferred to the target machine. However, the method and system described therein is “file system aware” which means that its operation depends on knowledge of how data (usually stored in blocks) is organized to be accessed, read, and written as files by the source machine in order to locate the bytes that will be transferred to a target machine. Since different machines can have different file systems, a different version of the method and system must be implemented for each different machine.
Data backup is an essential part of a maintenance plan for any computer system because it is the only remedy against data loss in case of disaster. The fundamental requirement to backup is keeping data integrity. When online data are backed up, both production and backup software access data in parallel and so parts of source data can be modified before they have been backed up. So that backup software should provide coherency of backed up online data in order to supply backup integrity. Virtual machine systems often involve some form of backup when migrating an application from one system to another. For example, United States Patent Application Publication Number 20090216816 describes use of a signal to an application prior to backup to prepare it to backup. A signal is sent to the system that is to be backed up in preparation of a backup. With backup there is the implicit assumption that a coherent backup is useful, as in this publication. Usefulness of a machine backup assumes that recovery involves restoration of the machine on the same hardware or virtualized hardware, where-as in reality machine backups often require changes in order that the images can be used directly, or that only a subset of useful data is extracted from the backup during recovery.
Several methods have been invented to provide data coherency in online backup, from which the most advantageous are ones based on the concept of “snapshots”. A snapshot is generally a virtually independent logical copy of data storage (e.g. a volume, a disk, a database or the like) at a particular instant in time. There are file system level and block level implementations of snapshots, yet hereafter only the block-level snapshots are discussed because of their universality and therefore more convenience for general-purpose backup solutions of arbitrary machine data. Once a snapshot has been created, it can be exposed as a virtual read-only storage whose data can be accessed via standard I/O functions. As soon as a snapshot has been created, production software continues working with production storage while snapshotted data are commonly used for various maintenance tasks such as backup, replication, verification et cetera. There multiple principles of snapshot operation have been contrived. Their common characteristics are (a) use of extra space to preserve snapshotted data; (b) computing overheads imposed by snapshot management means during snapshot creation, operation or deletion, depending on particular technique; and (c) that these snapshots occur at the storage control layer transparent to the machine actively running on top of this data.
Differential snapshots are based on the idea of holding only the difference between the current data and point-in-time data corresponding to the moment of snapshot creation. The most known representatives of differential snapshot methods are “copy-on-write” (abbreviated to COW) and “redirect-on-write” (abbreviated to ROW) techniques. The COW technique makes a copy of original data only at the first time data are updated. No data are stored at the moment of snapshot creation yet a snapshot manager starts monitoring I/O writes on production storage. Once controlled data are to be updated the snapshot manager suspends an update operation, stores original data in an auxiliary storage and then resumes data update. If snapshot contents are requested, the snapshot manager takes unchanged pieces of data from a production storage while for changed pieces their original contents are retrieved from an auxiliary storage. At the deletion of a COW snapshot an auxiliary storage is abandoned and nothing is made on production storage. COW snapshots require no recycling period and, multiple COW snapshots can coexist at same time without affecting each other, though snapshots can also be controlled by the machine using the snapshotted volume.
U.S. Pat. No. 7,680,996 describes use of a differential snapshot and a watch-list structure in conjunction with identifying and retaining updated blocks to shrink a set of data in order to shrink the data set size when performing an incremental backup that is practical for bandwidth limited replication. However, this snapshot and backup mechanism assumes all components run on the operating system of the machine that is being backed up. The technique described in U.S. Pat. No. 7,680,996 requires retention of data in order to create a backup.
U.S. Pat. No. 6,131,148 describes performing a snapshot of storage media without requiring additional storage space to be directly allocated on that media. This method continuously sends changes to storage media to a secondary storage media in the form of an instantly-available snapshot. Unfortunately, this requires continuous reliable communications for continuous sending of snapshots and a secondary storage medium for storing the snapshots.
U.S. Pat. No. 7,072,915 describes the snapshot data using a copy method that supplements outboard data copy with a previously instituted COW logical snapshot to create a duplicate that is consistent with source data at a designated time. This patent describes a system that utilizes logic at a layer below the a logical volume manager, such as within a SAN, versus a mechanism that is directly available to software that runs as past of the hypervisor or as a guest of the hypervisor. This system enables making secondary copies of storage that contains potential inconsistencies and restoring that that storage to a consistent state as of the time of a snapshot, however it is not usable for updating the content of a remote secondary store to a future state.
United States Patent Application Publication Number 20090150885 describes examples of the use of snapshots. It is noted that snapshots are not actual copies of data. Instead, a snapshot is a means of accessing data to be consistent with the time that the snapshot was taken. Snapshots have been used with a copy on write (COW) file command, e.g., as described in United States Patent Application Publication Number 20090037680. Some backup systems, such as that described in United States Patent Application Publication Number 20080140963, look at the difference between two snapshots.
United States Patent Application Publication Number 2010005806 entitled “VIRTUAL MACHINE FILE SYSTEM AND INCREMENTAL SNAPSHOT USING IMAGE DELTAS” describes a file system aware method that partitions the file storage of a virtual machine into writeable and non-writable portions. However, this system is also suffers from the drawback of being “file system aware.” File system awareness means that the implementation is specific to the OS and file system in use. File system awareness means that the result is based on knowledge of the content and higher-level use of stored information. Thus different implementations are required for each file system, and that as the rules of the file system change, the applicability of the backup method may also change. For example, a database may use its own file system to organize and access storage media, and at minimum a special implementation will be required, and the system must be aware of when this implementation method is required versus any other method.
United States Patent Application Publication Numbers 20080162841 and 20080162842 describe COW with the added detail of looking at the file system (and blocks as they pertain to the file system), which is one layer up. The goal is to take the knowledge the OS has about freed space and use that to optimize storage of blocks, that is, to remove blocks that the OS has freed from the local storage. Even if the data is “wrong”, it doesn't matter because it's not part of the filesystem.
Some backup systems used with virtual machines make use of a file system and an incremental snapshot that uses image deltas, e.g., as described in United States Patent Application 20100058106. However, the system and method described therein does not perform incremental backups. Nor does it separate transient data which changes from persistent data or replicate a current virtual machine context so that it will run arbitrary applications without issue. Instead the system described in publication 20100058106 separates immutable data from changeable data but does not consider the context required to run a reconstituted virtual machine from backup data.
Some backup software is not able to determine the time at which backups take place. United States Patent Application Publication Number 20090300080 describes a method to track changes to a storage volume. The method described in Publication 20090300080 can find a specific change and correlate it to a snapshot in time. It uses bock metadata annotations to track changes. It also states that two change lists consecutively created in the tracking process can be obtained and subsequently used to create an incremental backup, as opposed to using a single change list to create a consistent incremental backup.
Some backup systems make use of block-based incremental backup. For example, United States Patent Application Publication Number 20070112895 describes a method for reading and writing data to a block store that resembles a “redirect-on-write” snapshot mechanism except it adds functionality and writes all data to the physical volume. It uses a table of block “birth” or modification times along with other metadata to determine how to read data, what to do to roll back changes, and which blocks to back up in each increment. The method states that modified blocks are not written over, but metadata is used to write the replacement data in new blocks in order to retain the original file systems content data prior to writing. This method has to do significant additional calculations to determine what needs to be backed up versus what does not. This mechanism adds significant overhead to the writing process of data. If the backup process defined herein were applied to such a disk, the backup would preserve all the additional metadata and prior content information in the backups, however the result would be inefficient for replicating that data over a limited bandwidth network.
Moving execution of a virtual machine generally involves making the virtual machine (VM) and its clients aware that the VM is running in a different network context or location, e.g., as described in United States Patent Application Publication Number 20080201455. When a machine is recovered, its clients need to be able to find and communicate with their server. Similarly, the recovered machine must be able to initiate connection to access resources as it did before recovery. Many machines are designed to work with the assumption that the machine resides on the same private network as the other resources it communicates with. In some cases, the other resources are programmed to only communicate with a specific IP address. In other cases the resources require a complete restart to communicate to a server at a different IP address. In addition, some applications statically include IP addresses inside their configuration. Intervening firewalls or routers will mean that any broadcast or local multicast mechanisms built into the application will not longer work, often causing clients to be unable to locate servers, invalidate caches, etc.
One limitation to virtual machines is that they often require that an application be “migration enabled” in order to migrate. United States Patent Application Publication Number 20090007147 describes an example of such a system. However, migration enablement requires a particular configuration of an application, which is often impractical for migration of existing applications that are not migration enabled.
Another disadvantage of existing virtual machine migration methods is that they often requires special mechanism for establishing client connections to a virtual. United States Patent Application Publication Number 20070174429 describes an example of such a system.
An issue related to virtual machines is storage management for a data center. International Patent Application Number WO2010030996 (A1) describes storage management for a data center but does not address machine migration or efficient storage movement across networks. Chinese Patent Application Publication Number CN101609419 focuses on copying machine memory in general, including by not limited to the data one storage media. However, this publication does not address issues of storage management or storage efficiency within or among data centers.
A topic related to virtual machines is the concept of a virtual private network (VPN). In a VPN two computers are linked through an underlying local or wide-area network, while encapsulating the data and keeping it private. United States Patent Application Publication Number 20040255164 describes a virtual private network between a computing network and a remote device that uses an intervening component in a carrier network that serves two clients. This system requires use of an intervening component and requires more than one component to act as a tunnel client.
Another topic related to virtual machines is that of encryption. For example, United States Patent Application Publication Number 20080307020 describes a method that is specific to using separate keys for a primary storage media for data, and different keys for backup copies of that data.