Deduplicated data systems are often able to reduce the amount of space required to store files by recognizing redundant data patterns. For example, a deduplicated data system may reduce the amount of space required to store similar files by dividing the files into data segments and storing only unique data segments. In this example, each deduplicated file may simply consist of a list of data segments that make up the file.
With the advent of cloud storage, some deduplicated data systems may need to scale to store very large collections of data and/or to serve many clients. Accordingly, deduplicated data may be stored across many nodes. Unfortunately, even traditional multi-node deduplication techniques may scale poorly with enormous collections of data and/or large numbers of nodes. For example, traditional multi-node deduplication techniques may maintain a globally accessible index to detect duplicate data segments and keep a consistent view of all references to data segments across all nodes. Accordingly, as data collections increase in size and nodes increase in number, cross-node communications may consume network, memory, and/or storage resources at a disproportionate rate. As such, the instant disclosure identifies and addresses a need for additional and improved systems and methods for distributed data deduplication.