Deduplication storage systems such as described in U.S. Pat. No. 6,928,526, entitled EFFICIENT DATA STORAGE SYSTEM, filed Dec. 20, 2002 and issued Aug. 9, 2005, the disclosure of which is incorporated herein by reference for all purposes, have been disclosed. In such systems, a stream of data to be stored is divided into segments. Typically a segment is stored on the deduplication storage system only once, even if the segment occurs in more than one file or other object and/or otherwise occurs more than once in the data stream.
On occasion a need arises to generate an output stream comprising a specified subset of a set of data that has been stored in a deduplication storage system. For example, a data owner may wish to create a tape (or other removable or non-removable media) archive of a subset of data stored on a deduplication storage system. One approach that has been used to create a tape archive or other data stream comprising such a subset stored in de-duplicated form, to conserve space on the destination tape or other media, is to “re-inflate” (i.e., reverse deduplication and/or decompress) the data as stored on the deduplication storage system, feed the subset to a second deduplication storage system, then copy the subset as stored in de-duplicated form on the second deduplication storage system to the tape or other media. However, this approach consumes processing resources and time (to re-inflate the data, for example) and the availability of a second deduplication system.