Information drives business. Companies today rely to an unprecedented extent on online, frequently accessed, constantly changing data to run their businesses. Unplanned events that inhibit the availability of this data can seriously damage business operations. Additionally, any permanent data loss, from natural disaster or any other source, will likely have serious negative consequences for the continued viability of a business. Therefore, when disaster strikes, companies must be prepared to eliminate or minimize data loss, and recover quickly with useable data.
Replication technology is primarily used for disaster recovery and data distribution. Continuous replication technology often includes RAID based replication schemes (e.g., disk mirroring, parity, or the like). Continuous replication maintains copies of the data as the data is being written to by applications. Periodic replication is another technique utilized to minimize data loss and improve the availability of data in which a point-in-time copy of data is replicated and stored at one or more remote sites or nodes. In the event of a site migration, failure of one or more physical disks storing data, or failure of a node or host data processing system associated with such a disk, the remote replicated data copy may be utilized. For both continuous replication and periodic replication, in addition to disaster recovery, the replicated data enables a number of other uses, such as, for example, data mining, reporting, testing, and the like. In this manner, the replicated data copy ensures data integrity and availability. Additionally, replication technology is frequently coupled with other high-availability techniques, such as clustering, to provide an extremely robust data storage solution.
Data storage required for applications such as file systems and databases are typically allocated from one or more storage devices that are maintained as a “volume.” The “volume” may serve as a logical interface used by an operating system to access data stored on one or more storage media using a single instance of a file system. Thus, a volume may act as an abstraction that essentially “hides” storage allocation and (optionally) data protection/redundancy from the application. An application can store its data on multiple volumes. The content of a volume is accessed using fixed sized data units called blocks.
In very large distributed computer systems the data is distributed across a number of data servers. The clients directly write data to the data servers with minimal points of interaction with any metadata server. The data servers typically have multiple LUNs (logical unit numbers) which have their own reserved storage space. Each LUN can have a large number of partitions, with the objects contained in these partitions. The files of distributed computer system file system are composed of the objects from any of the data server LUN partitions. The objects from various data servers are combined to form RAID groups.
In a conventional virtual environment, it is desirable to have the ability to convert a physical server to a virtual machine in a target virtual environment such as VMware or Microsoft Hyper-V. Generally this concept is known as a “Physical to Virtual”, or P2V, conversion. In addition to direct conversion of a full image of a physical server to a virtual machine, the ability to apply incremental/differential changes to the virtual machine will also be supported. This will save time and computing resources by not having to repeatedly convert full system images. A product which provides traditional P2V capability while additionally supporting the application of incremental backup data to a target virtual machine has inherent risk of causing operator manual intervention if there is a failure in the incremental conversion process.
The above affect is, in part, due to potential higher instances of environmental failures associated with moving data to a virtual environment and the more complex makeup of virtual environment configurations, in general. If this occurs during unattended (schedule/policy-based) processing, a target virtual machine can fall drastically out of sync with the source server until the failure condition is rectified. Before a user can continue to apply new incremental change data to a fully converted virtual machine image they would normally have to apply missing incremental changes to the target virtual machine or start over again with a full conversion. In today's highly scaled up environments re-running a full conversion can be very time consuming for any traditional data center.
Additionally, applying the incremental/differential changes to the target virtual machine would require manual intervention on the operator's part. It would be time consuming and the user would risk mistakes in the application of the incremental/differential data, potentially, while risking corruption to the virtual machine. If this occurs the operator would experience even greater recovery time due to having to perform an original conversion of the full physical server.