RAID (Redundant Arrays of Inexpensive Disks) devices, which use a distributed cache memory storage system, use a redundant configuration in which multiple control modules control input and output of data to and from a storage to improve performance and reliability. Each control module executes data read/write operations on a logical volume.
Such a RAID device includes a copy function that guarantees the order of data in order to enhance reliability.
For example, when the RAID device stores a copy of data updated by a write instruction from a host device into multiple cache buffers, the RAID device sends the data stored in the multiple cache buffers to a copy destination device in units called buffer sets in order to guarantee the order of the data.
In conjunction with the technique described above, a backup device is known that monitors the availability of write buffers storing write data. When free space of a buffer runs low, the backup device writes the data in the buffer onto a disk system such as a RAID 0 or RAID 0+1 disk system. When the free space increases to a certain level, the backup device writes the data saved on the disk system back to the buffer.
Also, a storage system is known that maintains coherency of data between volumes when multiple remote copies are performed asynchronously.
There are Patent documents in accordance with storage system. Patents documents are Japanese Laid-Open Patent Publication No. 2006-268420 and Japanese Laid-Open Patent Publication No. 2007-264946.
However, the cache buffers can run out in certain circumstances, such as where the capacity of the line interconnecting devices is low, or where the line interconnecting devices is unstable, or where the amount of data to be written by a write instruction from a host device exceeds the capacity of the cache buffers.
When a cache buffer runs out, copy that guarantees the order of data may no longer be able to be maintained.
In order to avoid exhaustion of the cache buffer to maintain copy that guarantees the order of data, a save buffer may be provided on a separate storage such as a magnetic disk device. Data on the cache buffer is temporarily saved in the save buffer depending on the availability of the cache buffer. When the data is to be sent to a copy destination device, the data can be written back from the save buffer to the cache buffer, thereby avoiding cache buffer exhaustion.
There is the function of suspending a session that is performing copy that guarantees the order of data. When a conventional RAID device receives an instruction to suspend a session that is performing copy that guarantees the order of data, the RAID device reflects the copy data for the session to be suspended in a copy destination device to make the data in the copy destination device consistent with the copy data before suspending the session. This is done in order to ensure the order of the copy data. Accordingly, the RAID device cannot suspend before the copy data in the copy destination device becomes consistent with the copy data of the session to be suspended. The process for reflecting all copy data in the copy destination device to make the data in the copy destination data consistent and then suspending the session is referred to as “consistency suspend process”.
As has been described, in order for a RAID device to be consistent with the copy destination device, the copy data of the session to be suspended stored in the cache buffer of the RAID device needs to be reflected in the copy destination device.
To that end, upon reception of a suspend command from a host device, the RAID device starts monitoring copy data in a cache buffer of the session to be suspended. After all the copy data of the session to be suspended has been sent to the copy destination device, the RAID device suspends the session.
However, if a save buffer which has a large capacity is used, not only the copy data stored in the cache buffer but also the copy data stored in the save buffer need to be reflected in the copy destination device. Therefore, a large amount of time is required between reception of the suspend command from the host device and entrance into the suspend state.
As an example, an operation of a storage system 100 illustrated in FIG. 1 will be considered below.
The storage system illustrated in FIG. 1 includes a RAID device 110 located in Tokyo, a RAID device 120 located in Nagoya, and a RAID device 130 located in Osaka. The RAID devices 110 and 120 are interconnected through a network 140 so that they can communicate with each other. The RAID devices 120 and 130 are also interconnected through a network 150 so that they can communicate with each other.
FIG. 2 illustrates an exemplary backup operation of the storage system 100.
Session A executes asynchronous copy that guarantees the order of data from the RAID device 110 to the RAID device 120. Session B executes remote copy of data from the RAID device 120 to the RAID device 130.
When the RAID device 110 receives a consistency suspend instruction at 0:00 on 23, for example, the RAID device 110 initiates a consistency suspend process in session A. When the consistency suspend process is completed at 3:00 on 23, for example, all copy data of session A suspended is reflected in the RAID device 120. That is, the data in the RAID device 120 becomes consistent with the data that the RAID device 110 held at 0:00 on 23.
In session B, at 3:00 on 23 the RAID device 120 starts remote copy of data that has been updated since the last occasion to the RAID device 130. Once the remote copy has been completed, the data on the RAID device 130 becomes consistent with the data that the RAID device 110 held at 0:00 on 23.
Once the process described above has been completed, the RAID devices 120 and 130 hold the data consistent with the data that the RAID device 110 held at 0:00 on 23.
However, it takes much time for the RAID device 110 which uses a save buffer to complete the consistency suspend process as stated above. Accordingly, the probability that the storage system 100 can suffer disaster during the execution of the consistency suspend process is high.
For example, if both of the RAID device 110 in Tokyo and the RAID device 120 in Nagoya suffers disaster in the period between 0:00 and 3:10 on 23 in FIG. 2, all business data stored on 22 can be lost.