Deduplication storage systems are generally used to reduce the amount of storage space needed to store files by identifying redundant data patterns within similar files. For example, a deduplication storage system may divide multiple files into file segments and then identify at least one file segment obtained from one file that is identical to at least one file segment obtained from another file. Rather than storing multiple instances of a particular file segment, the deduplication storage system may store a single instance of the file segment and allow multiple files to simply reference that instance of the file segment to reduce the amount of storage space needed to store the files. As such, deduplication storage systems typically only store file segments that are unique (i.e., non-redundant).
Unfortunately, while deduplication storage systems may reduce the amount of storage space needed to store files, the process of deduplication may demand considerable time and resources. For example, in at least one traditional deduplication technique, a computing device may dedicate considerable time and resources to analyzing various files in an effort to determine whether such files qualify for deduplication (i.e., whether such files include redundant data patterns). As such, even though the computing device may identify some files that qualify for deduplication, the computing device may also be dedicating time and resources to analyzing files that ultimately fail to qualify for deduplication.
What is needed, therefore, is a more efficient and effective mechanism for selecting files for deduplication.