Deduplicated data systems are often able to reduce the amount of network and storage resources required to transmit and store data by recognizing redundant data patterns. For example, a deduplicated data system may reduce the amount of storage space required to backup similar files by (1) chunking (e.g., dividing) each of the files into a plurality of data segments, (2) identifying redundant (i.e., identical) data segments from within the plurality of data segments, and then (3) storing only those data segments that are unique (i.e., non-redundant).
Conventional deduplicated data systems typically use content-defined chunking algorithms (e.g., the Rabin fingerprinting algorithm) to chunk data into data segments based on the content of the data. To improve content-defined chunking performance, some deduplicated data systems may attempt to parallelize content-defined chunking calculations by (1) dividing data streams into multiple sections that are each large enough to include many data segments and then (2) chunking, in parallel, each section into a plurality of data segments. Unfortunately, deduplicated data systems that parallelize content-defined chunking in this manner may require large amounts of memory and may suffer from low concurrency because chunking calculations performed at the boundaries of any two sections may require data from both sections. Accordingly, the instant disclosure addresses a need for additional and improved systems and methods for parallel content-defined data chunking.