A storage is computer-readable media capable of storing data in blocks. Storages face a myriad of threats to the data they store and to their smooth and continuous operation. In order to mitigate these threats, a backup of the data in a storage may be created to represent the state of the source storage at a particular point in time and to enable the restoration of the data at some future time. Such a restoration may become desirable, for example, if the storage experiences corruption of its stored data, if the storage becomes unavailable, or if a user wishes to create a second identical storage.
A storage is typically logically divided into a finite number of fixed-length blocks. A storage also typically includes a file system which tracks the locations of the blocks that are allocated to each file that is stored in the storage. The file system also tracks the blocks that are not allocated to any file. The file system generally tracks allocated and unallocated blocks using specialized data structures, referred to as file system metadata. File system metadata is also stored in designated blocks in the storage.
Various techniques exist for backing up a source storage. One common technique involves backing up individual files stored in the source storage on a per-file basis. This technique is often referred to as file backup. File backup uses the file system of the source storage as a starting point and performs a backup by writing the files to a destination storage. Using this approach, individual files are backed up if they have been modified since the previous backup. File backup may be useful for finding and restoring a few lost or corrupted files. However, file backup may also include significant overhead in the form of bandwidth and logical overhead because file backup requires the tracking and storing of information about where each file exists within the file system of the source storage and the destination storage.
Another common technique for backing up a source storage ignores the locations of individual files stored in the source storage and instead simply backs up all allocated blocks stored in the source storage. This technique is often referred to as image backup because the backup generally contains or represents an image, or copy, of the entire allocated contents of the source storage. Using this approach, individual allocated blocks are backed up if they have been modified since the previous backup. Because image backup backs up all allocated blocks of the source storage, image backup backs up both the blocks that make up the files stored in the source storage as well as the blocks that make up the file system metadata. Also, because image backup backs up all allocated blocks rather than individual files, this approach does not generally need to be aware of the file system metadata or the files stored in the source storage, beyond utilizing minimal knowledge of the file system metadata in order to only back up allocated blocks since unallocated blocks are not generally backed up.
An image backup can be relatively fast compared to file backup because reliance on the file system is minimized. An image backup can also be relatively fast compared to a file backup because seeking is reduced. In particular, during an image backup, blocks are generally read sequentially with relatively limited seeking. In contrast, during a file backup, blocks that make up the content of individual files may be scattered, resulting in relatively extensive seeking.
One common problem encountered when backing up a source storage using image backup is the vulnerability caused by unencrypted data. For example, plain-text data of a source storage operated by an individual or business may be backed up into an image backup and then sent over a network to a third-party destination storage. However, the unencrypted data in the image backup may be vulnerable to being accessed by unauthorized users, and since the data is not encrypted, the unauthorized access can be devastating to the individual or the business. This problem has been mitigated to some extent by encryption schemes which are employed to encrypt runs of multiple blocks as a group before storing the runs in an image backup that is then stored on a third-party destination storage.
Another common problem encountered when repeatedly backing up a source storage using image backup is the proliferation of image backups over time. For example, where a source storage is backed up every day at 2:00 am to a third-party destination storage, at the end of one year, 365 image backups will exist for the source storage on the third-party destination storage. This proliferation of image backups can increase the amount of storage space needed to store the image backups on the third-party destination storage. This problem has been mitigated to some extent by consolidation schemes which are employed to consolidate multiple image backups into a single image backup, thus reducing the number of image backups and saving storage space. For example, the daily image backups discussed above can be consolidated into consolidated monthly image backups, thereby reducing the 365 image backups to 12 consolidated image backups.
However, in a situation where an image backup includes runs of multiple blocks which have been encrypted as a group, consolidating multiple image backups into a single image backup may be impossible without first accessing the encryption key that was used in the encryption of the runs and using the encryption key to decrypt the runs. Understandably, some individuals and businesses may be hesitant to provide a third-party destination storage access to an encryption key, since the encryption key can be used by unauthorized users to decrypt the data in the encrypted image backup, thereby exposing the encrypted image backup to the very vulnerabilities that the encryption was intended to avoid. Therefore, since current image backup solutions require that access be provided to an encryption key if consolidation of encrypted image backups is desired, many individuals and businesses choose to forfeit consolidation of encrypted image backups on third-party destination storages in order to avoid the security risk of permitting third parties access to encryption keys. This forfeiture results in a proliferation of image backups that can increase the amount of space needed to store the image backups on a third-party destination storage.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one example technology area where some embodiments described herein may be practiced.