Traditional file system backups generally require reading through each file record in the file system and identifying when each corresponding file was last modified to determine whether a file should be backed up. Specifically, if the last time a file was modified happens to be after the time of the previous backup, the contents of the file are read and backed up. A file system on a volume, however, may contain many millions of files and thus many millions of file records to review. Thus, the time to complete a backup can be very lengthy as the backup process requires checking the last modified date of every file to determine whether or not the file should be included in the backup. Indeed, even if none of the files were modified, the file records would still need to be read in order to conclude that none of the files were modified.
Block-based backups offer a new approach. In a block-based backup system, the backup system tracks which blocks are modified and these blocks are then backed up at a next backup that may be referred to as an incremental backup. A block-based backup can thus be completed much faster than traditional file system backups that require reading the entire file system to determine whether a file needs to be included in a backup. Once a block-based backup has completed, however, there still exists a need to discover the mapping between the backed up blocks and their corresponding files. This mapping is needed so that the backup can be indexed as to what files were included in the backup, what files changed since a last backup, what content in the file was changed, and so forth. Backups are generally not very useful if they are not searchable. This discovery process, however, can be extremely time-consuming and resource-intensive as there may be many millions upon millions of records to review for each completed backup in order to index each backup.
Thus, there is a need for improved systems and techniques to quickly and efficiently identify files and file changes between incremental block-based backups.
The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions. EMC, Data Domain, Data Domain Restorer, and Data Domain Boost are trademarks of Dell EMC.