Image level backups used for disaster recovery present new challenges as compared to legacy file system level backups. In particular, the size of disk images that need to be backed up require much longer times to backup. Backups of large disk images also significantly increase backup file storage requirements.
As compared to file level backups, which are typically set to backup only required file system objects, image level backups save complete images of backed up disks. Thus, unlike file-level backups, conventional image level backups typically include unnecessary data blocks belonging to file system objects that are of no value to users, deleted file system objects, file system objects marked for deletion, unallocated space, and unused space. While currently available commercial backup solutions such as VEEAM™ Backup and Replication from Veeam Software International Ltd. are able to efficiently remove white spaces (e.g., by using compression and deduplication), other unneeded data blocks mentioned above are still processed as part of image-level backups. This slows down backup performance and requires additional backup storage space. Thus, there is a need for methods of excluding unnecessary data from image level backups.
Conventional methods for reducing the amount of data which needs to be retrieved from a source disk and stored in the backup include querying a specific part of file system's FAT (File Allocation Table) to identify disk blocks which contain deleted data. The identified data blocks are then skipped during backup activities. For example, in systems using the MICROSOFT™ Windows New Technology File System (NTFS), deleted data blocks can be identified by querying and parsing a Master File Table (MFT), which is a part of the NTFS FAT. Some currently available disaster recovery tools, such as vRanger and vReplicator from QUEST SOFTWARE™, implement this technique.
Conventional methods for optimizing image level backups have significant drawbacks. Some of these shortcomings are discussed below.
First, conventional backup optimization techniques do not provide significant benefits unless a disk being backed up has a significant amount of blocks with deleted data (i.e., blocks marked as contained deleted data). However, many disks, such as disks used by newly-provisioned servers and computers with newly-installed applications and file system objects, do not have significant amounts of blocks with deleted data. In fact, using conventional techniques, additional processing is required to determine which disk blocks have deleted data. This additional processing may result in slow backup times.
Second, conventional backup optimization techniques provide little or no benefit during “incremental” backups, and may only be effective for “full” backups. Currently available technologies that facilitate efficient incremental backup, such as VMware Changed Block Tracking (CBT), allow backup solutions to determine data blocks in which content has changed since a previous backup, so that only those blocks are backed up during the incremental backup cycle. However, deleting data in file systems like NTFS, does not actually change the blocks corresponding to deleted file system objects, so the data blocks are not changed. Thus, these unchanged data blocks will not be picked up by the CBT for inclusion in an incremental backup without requiring some special processing. For example, ‘deleted’ NTFS file system objects like directories and files are merely marked for deletion in the MFT until the storage space is needed, at which point the corresponding blocks are filled with the new content.
Third, conventional backup techniques provide little benefit for incremental backups due to the nature of modern server workloads, which result in relatively little data being deleted, and primarily result in new data being added. This results in deleted data blocks being almost instantly reused by new data, leading to relatively few performance or storage benefits for incremental backups.
Fourth, conventional backup methods are not effective at optimizing backups of file systems which natively wipe deleted blocks (i.e., zero out) upon file system object deletion, such as the Linux third and fourth extended file systems (ext3 and ext4).
Finally, and perhaps most importantly, conventional image level backup techniques process and store significant amounts of data that are unnecessary in backup files. For example, conventional methods process and store the disk image data blocks corresponding to the contents of swap files, hibernation files, the contents of temporary ('temp') folders, recycling bin folders; and/or data such as Windows operating system (OS) system files which either do not need to be backed up at all, or can be easily restored from multiple other readily available sources. For example, certain OS file system objects, such as directories and files for a server or compute' can be readily restored from other similar servers or computers with the same OS installed. Conventional image-level backup optimization methods fail to take this into account and as a result consume valuable time and storage space processing data blocks that correspond to contents of files of no value to users.
Therefore, there is a need for an efficient techniques for optimizing image level backups which address the shortcomings of the image level backup optimization techniques described above.