Deduplicated data systems are often able to reduce the amount of space required to store files by recognizing redundant data patterns. For example, a deduplicated data system may reduce the amount of space required to store similar files by dividing the files into data segments and storing only unique data segments. In this example, each deduplicated file may simply consist of a list of data segments that make up the file.
With the advent of cloud storage, some deduplicated data systems may need to scale to store very large collections of data and/or to serve many clients. Accordingly, deduplicated data may be stored across many nodes. In order to properly deduplicate data a deduplicated data system may need to frequently look up data segments (e.g., using fingerprints of the data segments) to see if they already exist in the system (and if so, where). However, maintaining a global list of fingerprints on a single node may degrade performance by creating a bottleneck and/or degrade reliability by providing a single point of failure. Alternatively, maintaining a global list of fingerprints on each node may degrade performance by requiring extensive cross-node communication for each update to the list as well as creating a potentially unwieldy data structure to store on each node. Accordingly, the instant disclosure identifies a need for additional, improved, and more scalable systems and methods for performing lookups on distributed deduplicated data systems.