It is desirable that data stored on data storage entities, such as disk arrays, be protected against loss due to accidental erasure, malicious removal, or equipment failure. For this reason, data that is stored on data storage entities may be copied to another entity or location for safe-keeping, a process commonly referred to as backing-up the data, or a “backup process”. If the backup data is needed, it is copied back to the original data storage entity, a process commonly referred to as recovering or restoring the data, or a “restore process”. By convention, a backup process copies data from a source data storage entity (“the source”) to a destination storage entity (“the destination”). A restore process copies data from the destination back to the source. If a portion or block of the destination data storage entity contains the same data as the corresponding block in the source, the two blocks are said to be synchronized to each other, or “in sync”. If the entire contents of the destination match the corresponding contents of the source, the source and destination are said to be in sync.
The backup process may occur on demand, such as in response to a backup request from a user, or it may occur continually in the background as data is written to the source. For example, any time new data is written to the source, a backup process manager may detect that the data in a particular block or portion of the source has changed, and initiate a request to copy data from the changed block in the source to a corresponding block in the destination.
In this scenario, a potential conflict may occur if the source receives a request for a write to a source block while a restore process is occurring. There are three writes involved: 1) the write of new data to the source; 2) the write of data from the source to the destination that occurs as part of the continual backup process; and 3) the write of data from the destination to the source that occurs as part of the ongoing restore process. The relative order of these three writes will determine whether the source and destination contain the new write data or the previously backed up data. In one example, the new data is written to the source, copied from the source to the destination as part of the continual backup process, and copied from the destination back to the source as part of the restore process. In another example, the new data is written to both the source and destination, and later copied from the destination back to the source as part of the restore process. In both of these examples, the restore process fails to restore the source to the exact state of the destination at the time that the restore request was made, i.e., at the time that the restore process was started. Instead, at the conclusion of the restore process, the source will have the same data as the destination, except for the block that was modified by the write that occurred in the middle of the restore process. This kind of restore, i.e., in which the contents of the destination may be changed while the restore is still in progress, is commonly referred to as an “unprotected restore”.
For this reason, some data storage systems support a “protected restore” operation, which is a restore process during which no writes are allowed to the destination and during which reads must return restored data. If a read request is received by the source while a protected restore is executing, a storage manager typically checks to see if the blocks to be read have been restored from the destination yet. If they haven't, the storage manager will either put the desired blocks at the top of the queue of blocks to be copied from destination to source as part of the restore process, or the storage manager will instruct some other process to copy the desired blocks from the destination to the source, so that the read request will return restored data. This process of copying needed blocks is referred to as “copy-on-demand”, or COD.
Systems that implement a protected restore are disclosed in the following commonly-assigned U.S. patents, all having the same title of “System and Method for Managing Data Associated with Copying and Replication Procedures in a Data Storage Environment”: U.S. Pat. Nos. 7,096,331 and 7,133,985, both filed on Sep. 29, 2003; and U.S. Pat. No. 7,353,351, filed on Oct. 6, 2003, the disclosures of which are incorporated by reference herein in their entireties.
Conventional systems, however, process a write during a protected restore in the same manner, by first copying the affected blocks from the destination to the source, e.g., performing a COD, and then overwriting the blocks with the new data. When performed in preparation for a write to the source, a conventional copy-on-demand process is inefficient, because it must copy blocks from the destination to the source even though those blocks will be completely overwritten during subsequent the write.
Furthermore, as the capacities of data storage systems increase, the amount of data contained in each block tends to increase, e.g., the block size becomes larger and larger. This causes two problems: first, with the advent of digital multimedia, the size of an average data file that is stored on the data storage system has increased from kilobytes to gigabytes in size; second, block sizes are so large that a single block may contain portions of more than one file. As a result of the first problem, conventional implementations of a write during a protected restore end up needlessly copying enormous amounts of data that will just be overwritten again—a waste of time and resources. As a result of the second problem, special attention must be given to blocks which contain data from more than one file, to make sure that data from one file is not accidently overwritten while writing data for another file in the same block.
Accordingly, in light of these disadvantages associated with conventional implementations of a write during a protected restore, there exists a need for systems, methods, and computer readable media for copy-on-demand optimization for large writes.