In a typical storage management system, administrators are able to configure storage pools, each storage pool being a collection (a group) of the same device used for storing end-user data. These storage pools are used as targets for store operations from a client and are referenced in server policies and other constructs for processing. When storing data on behalf of a client, the storage management system has the ability to simultaneously store data to more than one storage pool. One, and only one, of the pools are configured as a primary storage pool where the ‘master’ copy of the data is kept. Other storage pools may include copy storage pools that are used to recover data in the primary pools, and active-data pools where only active versions of backup data are stored for rapid recovery of client machines. Writing simultaneously to multiple pools during client backup reduces the window needed for back-end processes to copy data from the primary pools to either the copy or active-data pools.
One method of accomplishing this simultaneous write operation is through use of a session thread that is started to receive data from the client. Separate transfer threads (one per storage pool) are also started and are responsible for writing the data to the storage media. As the session thread receives the data, it places it into a transfer buffer. Once the buffer is full, the session thread signals the transfer threads that there is work to do. Each transfer thread takes the transfer buffer and writes the data to the storage media. It is important to note that the transfer threads are able to read from the buffer at the same moment. Once all transfer threads have finished writing the buffer's data, they signal to the session thread that they are ready for another buffer. The session thread then passes the next buffer to the transfer threads, and this process repeats until all data has been written to the media in all pools.
Various storage management systems now use storage pools which are enhanced by the use of data deduplication, whereby the redundant storage of common data is greatly reduced. In a typical deduplication configuration, a disk-based storage system, such as a storage-management server or virtual tape library, has the capability to detect redundant data “extents” (also known as data “chunks”) and reduce data duplication by avoiding the redundant storage of such extents. If a redundant chunk is identified, that chunk can be replaced with a pointer to the matching chunk. These storage pools are referred to as deduplication pools. Primary, copy, and active-data pools can all be implemented as deduplication pools.
There may be significant advantages to creating a non-deduplicated copy of data in lower-cost storage that will be deduplicated in a primary storage pool. Maintaining such copies can mitigate the risk of data loss related to deduplication in the primary data store; these risks include the potential for false chunk matches, media failure, and logic errors. However, copying data imposes additional demands on the storage management system in addition to the time-consuming and resource-intensive deduplication operations to identify and remove duplicate data. Consequently, available time windows may not be sufficient to allow completion of operations to copy and deduplicate data. What is needed is an efficient operation for copying data in non-deduplicated form during a data transfer in which deduplicated data is stored or moved.