By making a backup copy of an active database at a point in time, the state of the database at that point in time can be restored if, for example, the database subsequently becomes corrupted or is lost. Deduplication of data can reduce the amount of storage space and bandwidth consumed by backups and restores. With deduplication, after a segment of data is stored, other instances of data that match that segment are replaced with a reference to the stored segment of data.
There are many techniques that provide deduplication for virtual hard disks. One such technique uses file extents to create segments for a given virtual disk (an extent is a contiguous region of physical storage, usually corresponding to a file; a segment may include one or more extents). Another technique is based on the Rabin fingerprinting technique. However, conventional deduplication techniques fail to preserve the format of the virtual disks. Instead, virtual disks in formats such as Virtual Machine Disk (VMDK) format and Virtual Hard Disk (VHD) format are converted to the TAR (tape archive) file format, for example. This can be a disadvantage because, for example, more time is needed to recover deduplicated files because of the time needed to convert the data back to the native VMDK or VHD format.