The basic premise of a data deduplication system is that data stored into the system must be analyzed, broken into pieces (commonly referred to as “chunks”), duplicate chunks identified (each unique chunk is given a digital signature strong enough to declare that two chunks with the same signature are actually the same data), and duplicate chunks eliminated. Normally, as the deduplication system breaks apart larger objects into chunks, it must keep track of the individual chunks which make up the larger object, so the larger object can be retrieved when desired.
Deduplication reduces space requirements by eliminating redundant chunks of data objects and replacing them with links or pointers to the single remaining chunk. Generally speaking, there must be some type of index or database to keep track of the chunks of a larger object so that the larger object can be reassembled and retrieved after deduplication has removed the redundant chunks. Furthermore, the database used to track the chunks is generally embedded within the deduplication system. In other words, the deduplication system knows about its objects and chunks, but does not generally share this chunk information with any other system.
The deduplication system may be embodied in a storage management system that spans multiple storage volumes and storage pools. For example, data may be sent by storage management clients or data protection agents to the storage management server for storage. One characteristic of a storage management system is that data can be copied for redundancy, transferred to a new media type, or moved to reclaim unused space from deleted objects. Data redundancy and deduplication, in fact, work well together in a storage management system, because the more data that is deduplicated, the more important it is to have some backup copies of the data within the storage management system to help protect overall data integrity.
A storage management system typically stores copies of objects on separate media, so that loss of a piece of media due to a hardware error or other failure will not compromise the data within the storage management system. Alternatively, data can be moved from one storage location to another, either within the same storage pool or between storage pools. The configuration of existing storage management systems, however, does not enable a simple transfer of data chunks when attempting to perform certain data retrieval and recovery operations on deduplicated storage pools.
Within existing storage management systems, data stored in one deduplicating pool cannot be shared with, or deduplicated against, data stored in a different deduplicating pool. Thus, if a chunk in one deduplicating pool is lost (for example, due to hardware error), then this results in two side effects. First, the single damaged chunk cannot be retrieved from another storage pool during a data retrieval operation. Restated, if a 10.5 gigabyte data object is being restored from some storage pool, and all data is transferred successfully until the process encounters a damaged chunk at the 10.4 gigabyte mark, the entire object would need to be retrieved from a different storage pool.
Additionally, in existing storage management systems, an undamaged copy of only the damaged data chunk cannot be recovered from another storage pool. Storage management systems do have the ability to perform a storage pool recovery operation which replaces damaged copies of objects in one pool with a good copy in another pool, but this is performed on the entire object. Because data chunks are not shared across pools, there is no capability of transferring a single data chunk.
What is needed is a method to retrieve and/or recover data chunks from alternate data stores in a deduplicating storage management system without the need to unnecessarily transfer or access the entire data object containing the data chunks.