As the capacity of storage devices increases and as I/O operations get faster, the ability to manage data storage operations in line with data transfers is getting weaker. Historically, when data was transferred (e.g., during a backup operation) from a primary storage server to a secondary storage server, administrators determined the size of a secondary volume on the secondary storage server to handle the transfer by predicting or estimating the rate of change of the data. For example, an administrator may have estimated that it was extremely unlikely that 20% of the data would have changed between backup operations. Therefore, an administrator would set the size of the secondary volume on the secondary storage server to be 20% larger than the primary volume on the primary storage server. The danger in making such estimations lies where the estimations are wrong, resulting in failed backup operations. Estimating the size of the secondary volume can have provided a solution in environments having a small amount of data to manage, but such a solution is insufficient for managing large amounts of data (e.g., the data of an enterprise).
One attempt at managing large amounts of data and data storage operations was to not try to predict the rate of change of the data, but to actually adjust the size of the secondary volume in line with the write operations of the data transfer. However, the size of the secondary volume could not grow on demand in the middle of a data transfer because the ability to resize the secondary volume could not keep up with the write operations. Another attempt at managing large amounts of data and data storage operations involved thin provisioning every secondary volume to the largest possible size. Those skilled in the art will appreciate that thin provisioning is a way of presenting more storage space to the hosts or servers connecting to the storage system than is actually physically available. This solution proved successful in environments where there were a small number of thin provisioned secondary volumes (e.g., 10, 12, 14 thin provisioned secondary volumes). However, many storage system background processes scaled linearly with the thin provisioned size of the secondary volumes. Consequently, in storage systems that pushed the limits of the storage system (e.g., environments having 500 thin provisioned secondary volumes), thin provisioning a great number of volumes to the largest possible size proved disastrous. For example, thin provisioning 500 secondary volumes to be the size of an entire pool of storage translated into a 500× increase in time taken by certain storage system background processes.
One way to provision the necessary size of the secondary volume is to determine the size of the data that is to be transferred. However, transfer engines typically do not determine the amount of data to be transferred until the transfer is actually complete. In addition, traversing an active file system to determine the size of the data to be transferred can be an expensive and intensive operation that can significantly impact the load on a storage system's CPU.