A storage is computer-readable media capable of storing data in blocks. Storages face a myriad of threats to the data they store and to their smooth and continuous operation. In order to mitigate these threats, a backup of the data in a storage may be created at a particular point in time to enable the restoration of the data at some future time. Such a restoration may become desirable, for example, if the storage experiences corruption of its stored data, if the storage becomes unavailable, or if a user wishes to create a second identical storage.
A storage is typically logically divided into a finite number of fixed-length blocks. A storage also typically includes a file system which tracks the locations of the blocks that are allocated to each file that is stored in the storage. The file system also tracks the blocks that are not allocated to any file. The file system generally tracks allocated and free blocks using specialized data structures, referred to as file system metadata. File system metadata is also stored in designated blocks in the storage.
Various techniques exist for backing up a source storage. One common technique involves backing up individual files stored in the source storage on a per-file basis. This technique is often referred to as file backup. File backup uses the file system of the source storage as a starting point and performs a backup by writing the files to a destination storage. Using this approach, individual files are backed up if they have been modified since the previous backup. File backup may be useful for finding and restoring a few lost or corrupted files. However, file backup may also include significant overhead in the form of bandwidth and logical overhead because file backup requires the tracking and storing of information about where each file exists within the file system of the source storage and the destination storage.
Another common technique for backing up a source storage ignores the locations of individual files stored in the source storage and instead simply backs up all allocated blocks stored in the source storage. This technique is often referred to as image backup because the backup generally contains or represents an image, or copy, of the entire allocated contents of the source storage. Using this approach, individual allocated blocks are backed up if they have been modified since the previous backup. Because image backup backs up all allocated blocks of the source storage, image backup backs up both the blocks that make up the files stored in the source storage as well as the blocks that make up the file system metadata. Also, because image backup backs up all allocated blocks rather than individual files, this approach does not necessarily need to be aware of the file system metadata or the files stored in the source storage, beyond utilizing minimal knowledge of the file system metadata in order to only back up allocated blocks, since free blocks are not generally backed up.
Image backup can be relatively fast compared to file backup because reliance on the file system is minimized. An image backup can also be relatively fast compared to a file backup because seeking during image backup may be reduced. In particular, during image backup, blocks are generally read sequentially with relatively limited seeking. In contrast, during file backup, blocks that make up individual files may be scattered in the source storage, resulting in relatively extensive seeking.
One common problem encountered when backing up multiple similar source storages to the same backup storage using image backup is the potential for redundancy within the backed-up data. For example, if multiple source storages utilize the same commercial operating system, such as WINDOWS® 8 Professional, they may store a common set of system files which will have identical blocks. If these source storages are backed up to the same backup storage, these identical blocks will be stored in the backup storage multiple times, resulting in redundant blocks. Redundancy in a backup storage may increase the overall size requirements of backup storage and increase the bandwidth overhead of transporting blocks to the backup storage.
While this redundancy problem can be mitigated to a certain extent through the use of a deduplication vault, a standard deduplication vault, in order to deduplicate the blocks of a storage, must first receive the blocks from the computer system of the storage in unencrypted form, after which the deduplication vault will store the block if it is unique, or if the vault supports encryption it will encrypt and store the encrypted block if it is unique. In this way the standard deduplication vault will support deduplication of blocks from multiple systems. However, as the standard deduplication vault requires, at least temporarily, access to the unencrypted blocks, this provides an opportunity for these blocks to be compromised should the security of the deduplication vault be compromised or faulty. For this reason, encrypted deduplication vaults have been developed in which each block is encrypted by the source computer system prior to backing up the block into the encrypted deduplication vault, such that the deduplication vault, without being provided the decryption key, is unable to decrypt the encrypted blocks.
While encrypted deduplication vaults have alleviated the concerns regarding unauthorized access to sensitive blocks, a common problem encountered during backup into an encrypted deduplication vault is that encrypted blocks may not be capable of deduplication across different clients. In particular, while the blocks that make up a commercial operating system or a standard application may be identical in their plain text form, encryption of two identical plain text blocks can result in differences in the encrypted versions of the blocks, as each client is likely to use its own unique encryption password. Thus, even if an identical plain text block is backed up across different source storages, the encrypted block that is actually stored in the deduplication vault may be different for each source storage, resulting in the identical plain text block being stored multiple times in different encrypted forms. As a result, the benefits of deduplication may be lost even when identical blocks are being backed up because different source systems may encrypt identical blocks differently, particularly if different encryption passwords are used on the different source systems.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one example technology area where some embodiments described herein may be practiced.