The basic premise of a data deduplication system is that data stored into the system must be analyzed, broken into pieces (commonly referred to as “chunks”), duplicate chunks identified (each unique chunk is given a digital signature strong enough to declare that two chunks with the same signature are actually the same data), and duplicate chunks eliminated. Normally, as the deduplication system breaks apart larger objects into chunks, it must keep track of the individual chunks which make up the larger object, so the larger object can be retrieved when desired.
Deduplication reduces space requirements by eliminating redundant chunks of data objects and replacing them with links or pointers to the single remaining chunk. Generally speaking, there must be some type of index or database to keep track of the chunks of a larger object so that the larger object can be reassembled and retrieved after deduplication has removed the redundant chunks. Furthermore, the database used to track the chunks is generally embedded within in the deduplication system. In other words, the deduplication system knows about its objects and chunks, but does not generally share this chunk information with any other system.
The deduplication system may be embodied in a storage management system that spans multiple storage volumes and storage pools. For example, data may be sent by storage management clients or data protection agents to the storage management server for storage. One characteristic of a storage management system is that data can be copied for redundancy, transferred to a new media type, or moved to reclaim unused space from deleted objects. Data redundancy and deduplication, in fact, work well together in a storage management system, because the more data that is deduplicated, the more important it is to have some backup copies of the data within the storage management system to help protect overall data integrity.
A storage management system typically stores copies of objects on separate media, so that loss of a piece of media due to a hardware error or other failure will not compromise the data within the storage management system. Alternatively, data can be moved from one storage location to another, either within the same storage pool or between storage pools. However, moving data between storage pools in conventional deduplication systems requires that the deduplicated data be re-assembled into an entire data object before transfer, and then possibly deduplicated once again after transfer to the target location back into deduplicated chunks. This re-assembly and repeat deduplication processing is resource intensive and inefficient.
What is needed is a method to efficiently transfer deduplicated data between storage pools in a storage management system without the need to re-assemble and deduplicate data chunks unnecessarily.