The present invention relates to data storage, and more particularly, to systems and methods configured to delay replication of data to a data archive for a period of time to enhance performance of data archive systems.
In a multi-cluster environment (e.g., a grid configuration), each cluster is capable of replicating data between sites based on a defined policy to meet each client's or user's established disaster recovery (DR) objective. For example, two clusters may be electronically connected via an Ethernet wide area network (WAN) and asynchronous volume replication to the remote site may be executed which allows status to be surfaced immediately, versus waiting for the replication to complete synchronously.
Each cluster (or site) may utilize a different technology, thus replication to a cluster or site may occur based on the needs of the client or user. For example, a cluster may have a physical tape back store (magnetic or optical) in which data targeting that site is intended to end up on physical tape. This physical tape back store may be intended to be used for long retention based archival data which should end up also particular site or cluster. A grid architecture of a physical tape drive may also support an auto-removal function where aged data (data which has resided on a storage medium for a predetermined amount of time without access thereto) is automatically removed from random access storage configurations (such as disk-only-based storage configurations) after verifying that it has replicated to another cluster or site.
Essentially, the ability to replicate data to physical tape back stores and remove data from the random access storage configurations acts as a form of hierarchal storage management for physical tape within a grid. In addition, many clients or users have a minimal retention time for some types of data, yet they are not necessarily able to differentiate between data that will become long term archive versus data which will be kept for a shorter period of time. In many cases, only a small percentage of many workloads age long enough in order to justify replication of this data to an archive technology (e.g., a physical tape back store).
Typically, many clients or users prefer that data only remain in certain random access storage configured clusters or sites until it ages a certain number of days or weeks. Only then do the clients or users believe it is necessary to replicate this data to a physical tape back store for archival purposes. For data which does not have a long enough existence to be archived, storage space for this data is typically returned to scratch and expired long before it would ever become a candidate for automatic removal.
Therefore, the replication of data to a physical tape back store is excessive in these early expire cases from a viewpoint of the cache, physical tape, and network utilization, and simply introduces overhead when sufficient redundant copies of this data already exists in backup media in the random access storage configured clusters or sites.