Deduplication systems apply deduplication of data when performing backups from clients or client-policy pairs. As client data stored in data containers in data storage grows into the terabyte range, and on into the petabyte range and beyond, managing a deduplication pool becomes more and more unwieldy. Disaster recovery is likewise predicted to take longer if a deduplication system suffers from data loss or corruption due to hardware failure or filesystem failure. With a traditional deduplication approach, the sizes of the global fingerprint index and reference database are proportional to the number of unique data segments stored within a deduplication pool. At some point, a deduplication pool can grow so large that the recovery process takes an unacceptable length of time and breaks a service level agreement with a client. Scalability of deduplication systems is thus in jeopardy.
It is within this context that the embodiments arise.