The present invention relates to a method for deciding on the transfer order for data (files) when transferring files from a production site to a remote site in an asynchronous replication function.
Many storage products connected to a network are equipped with an asynchronous replication function to implement a backup and disaster-recovery solution. Scale-out Network Attached Storage (SONAS) from IBM Corp. is also equipped with such an asynchronous replication function. In large-scale storage, such as SONAS, there is a good chance of a failure occurring during data transfers because data transfers take several hours when the amount of updated data is significant, and when the bandwidth of the wide area network (WAN) is narrow, which is common in conventional installations.
In large-scale storage supporting a petabyte (PB) of data, such as SONAS, the execution interval for asynchronous replication is once a day or once every twelve hours for many users. Usually, a remote site has been established at a location some distance from the production site, and the two sites are connected via a WAN. Under these conditions, network delays are often significant. In many cases, data transfer efficiency is improved by performing parallel transfers of different data using multiple node processing. The production site storage is used for read/write operations and the remote site storage is often used as read-only.
When a storage failure occurs at the production site during a data transfer, the data replicated in storage at the remote site becomes the latest backup. However, updated files that were not replicated are lost. When such a failure occurs and a failover is performed to the remote site, all of the files at the remote site may be restored to the state at the most recent synchronization (sync) point. In this method, all updated data at the production site that was updated or added after this sync point is lost.
These storage systems allow for the priority backup of files that are most important to the user in order to minimize damage from data loss that may occur during asynchronous replication. Here, it is necessary to automatically determine that the most frequently updated and referenced files are the files that would cause the most problems for the user if lost.
Laid-Open Japanese Patent Publication No. 6-250902 only focuses on the access count at the production site (the site that is backed up) and selects files for backup when the number of updates exceeds a predetermined value.
However, in Laid-Open Japanese Patent Publication No. 6-250902, access count at remote sites is not taken into account when determining the importance of a file. The access count of a backed up file at a remote site is a useful factor in determining files that are useful to the user.