Organizations increasingly back up large quantities of data for a variety of purposes, including disaster recovery, safeguarding against accidental data corruption or deletion, and meeting electronic discovery (“e-discovery”) requirements.
However, the increasing volume of backed up data may make locating certain data within a backup more difficult. An inability to efficiently find certain data or data with certain characteristics may undermine some of the goals behind backing up data. For example, if an organization wants to recover all files that reference a certain subject matter, it may be impractical to look through every file in a backup. Likewise, if an organization must retrieve files for e-discovery, manually inspecting each file in a backup may be inefficient or even impracticable.
In order to facilitate the efficient location of files in a backup, an organization may attempt to index the content of the files in the backup. Unfortunately, indexing files during a backup operation may substantially slow the backup operation, and indexing files after a backup operation may be a resource intensive operation (e.g., if the backup files are stored on tape) or impractical (e.g., if the backup files are stored offsite and are not accessible by a network). Accordingly, the instant disclosure identifies a need for efficiently and effectively indexing backup content.