A deduplication apparatus or process may create a repository of blocklets by deduplicating a data stream. The blocklets in the repository may have a size or other property that is related to the realities of the apparatus or process that created the blocklets and populated the repository with the blocklets. For example, the apparatus or process creating the blocklets may be optimized for ingest speed, for ease of file recreation, for minimizing the number of duplicates, for blocklet size distribution, or for other parameters. More generally, a deduplicator creating blocklets for a repository has one set of constraints. However, creating blocklets is only part of deduplication and a user of a repository may have a different set of constraints.
A repository may be used for a long period of time. A repository may be used by different entities for different purposes. Thus, the constraints that were in place when the repository was created, while appropriate for the ingest and creation phase of deduplication, may yield suboptimal performance for other entities at other times.
FIG. 1 illustrates a deduplicator 110 producing a blocklet repository 112 by deduplicating a data stream 100. To make the blocklet repository 112 usable, deduplicator 110 also creates an index 114 and recipes 116. The index 114 facilitates finding blocklets in the blocklet repository 112. The index 114 may be indexed using a hash of a blocklet. The recipes 116 facilitate recreating a file, binary large object (BLOB), or other collection of data that has been partitioned into blocklets that are stored in blocklet repository 112. A file system 118 may recreate a file by accessing a member of the recipes 116 and then retrieving blocklets from the blocklet repository 112 using the index 114 to locate the blocklets. FIG. 1 is an example of single deduplication where the data stream 100 is deduplicated once to create blocklet repository 112. Deduplicator 110 may perform deduplication like that described in U.S. Pat. No. 5,990,810 issued to Williams.