Data storage on disk has been rapidly outgrowing typical means to back up data on those disks to removable storage, such as tape. At the same time, the need to provide cost effective backup copies has grown out of, for example, practical needs and trade and federal rules/legislation.
A single and simple remote replication target site may suffice for storing historical data. However, the cost to maintain every snapshot taken at the source site at the remote site could be prohibitive. Items contributing to the costs, include but are not limited to: the opportunity cost of the bandwidth used; the real dollar cost of the bandwidth; the real dollar cost of the remote site, including for example, the size of the site, the power required to operate the site, the employee cost for the site, etc.; the administrative cost for replication; and the storage cost, including the cost of the disk drives or other block store devices.
Conventional methods of replicating data to a backup storage can result in extra, unnecessary data being transferred between the source site and backup site. For example, in one example method of replicating data, consider a data storage system 100 having local storage 102 and backup or remote storage 104, as illustrated in FIG. 1. At local storage 102, which maintains active data input/output (I/O), the system is configured for local recovery snapshots at 8 hour intervals, i.e., snapshots 106, 108, 110, and 112. Each snapshot identifies the changes or delta between it and the prior snapshot. For example, snapshot 108 identifies only the changes since snapshot 106 was taken, or from 12 am to 8 am. In contrast, the backup storage 104 may, for example, be configured for only nightly backup, i.e., backup once every 24 hours, as a lengthier backup interval period for the backup data may be sufficient and more efficient in overall storage use at the backup site because the data is less active or inactive. Regardless of the 24 hour backup period at the backup storage 104, however, because the snapshots at the local storage identify only the deltas between each snapshot, in the course of a day, each snapshot will nonetheless be at least temporarily replicated to the backup storage, as illustrated in FIG. 1, in order for the backup storage system to identify the entire day's changes and appropriately create the 24-hour daily backup. To save space, any intermediate backups 114, 116 can be deleted once the 24 hour backup 118 is committed. Nonetheless, assuming an example 10 terabyte (TB) dataset and a worst case scenario where 100% of the dataset changes every 8 hours at the local storage, such conventional method would require the entire 10 TB to be transferred to the backup storage 104 every 8 hours, resulting in a total daily transfer of 30 TB.
In the above example, only 24 hour snapshots 118 and 120 are of interest, and if intermediate snapshots 112, 114 could be eliminated, even in the worst case scenario, the daily transfer of data from the local storage 102 to the backup storage 104 would be reduced from 30 TB to 10 TB. The problem may increase even more where, for example only, the dataset is much larger than 30 TB, where the local storage takes snapshots at intervals shorter than 8 hours, and/or where the backup storage takes backups at larger intervals than 1 day. However, it is recognized that systems where the dataset is much smaller than 30 TB, where the local storage takes snapshots at intervals longer than 8 hours, and/or where the backup storage takes backups at smaller intervals than 1 day would likely have the same issues.
Thus, there is a need in the art for providing more cost effective and/or more efficient replication processes for, for example, backup or historical data.