The present invention generally relates to technology for restoring data in a storage system.
RAID (Redundant Array of Independent Disks) technology is generally used to enhance storage system reliability. According to this technology, a logical storage device (hereinafter, logical volume) is constituted from a plurality of physical storage devices (hereinafter, physical devices, for example, hard disk drives or flash memories). Consequently, storage system reliability is enhanced. The logical volume is provided from the storage system to a host computer as the target of an I/O request (a data read or write).
There are a number of levels of RAID. For example, there is RAID 1. In RAID 1, for example, one logical volume is constituted by two physical devices of the same capacity, and data written from the host computer is written to the two physical devices. In this case, should either one of the physical devices fail, the data can be acquired from the remaining physical device. The reliability of the storage system is thereby enhanced. Further, replacing the failed physical device, and copying the data from the remaining physical device to the post-replacement physical device restores the data to the redundant state, once again achieving a heightened state of reliability. Thus, hereinafter, the process for writing the data that was stored in the failed physical device to the post-replacement physical device subsequent to replacing the failed physical device will be called the “RAID restore process”.
In addition to RAID 1, there are other RAID levels, such as RAID 5, which use parity to prevent data loss. When one physical device fails, these technologies also enhance storage system reliability by making it possible to determine the data that was stored in the failed physical device via computations that make use of the data and parity stored in the remaining physical devices. Further, replacing the failed physical device, restoring the data that was stored in the failed physical device from the data and parity stored in the remaining physical devices, and storing the restored data in the post-replacement physical device, once again realizes a state of heightened reliability the same as in RAID 1. In RAID 5 and so forth, the data restored using parity is written to the post-replacement physical device, and this process becomes the “RAID restore process” mentioned above.
Other technologies for heightening storage system reliability include a technique that utilizes remote copying (for example, Japanese Patent Laid-open No. 2003-233518). Remote copying is a technique for redundantly storing data in two storage systems. First, for example, logical volumes of the same capacity are created in the two storage systems. Next, the two storage systems are interconnected (for example, by establishing a logical path), and the two created logical volumes are defined as a remote copy pair. The two logical volumes defined as a remote copy pair constitute a state in which the one side is called the primary volume, and the other side is called the secondary volume. The host normally issues an I/O request to the primary volume. When data is written to the primary volume, the storage system, which maintains the primary volume, stores the write-targeted data received from the host in the primary volume, and at the same time writes this data to the secondary volume. In this case, even if the storage system maintaining the primary volume fails, the host can continue the task at hand by accessing the secondary volume instead of the primary volume, and using the updated data in the secondary volume.
Further, a technique other than remote copying is one called the “dynamic capacity allocation function” disclosed in Japanese Patent Laid-open No. 2005-011316. This technique is constituted from a “capacity pool”, which brings together the storage areas of the storage system, and a “virtual volume”, which does not have a physical storage area.
The “capacity pool” is a storage area constituted by two or more logical volumes from among a plurality of logical volumes maintained by the storage system, and is used for storing write-targeted data from the host. By contrast, the “virtual volume” is an I/O request target, which is provided to the host from the storage system in place of a logical volume, and which does not have a physical storage area. In the dynamic capacity allocation function, a storage area is not allocated to the virtual volume initially. Triggered by a data write to the virtual volume from the host, the storage area for holding the write-data is acquired from a logical volume selected from within the capacity pool, and this storage area is allocated to the data write location of the virtual volume specified in the I/O request from the host (the data write location of the virtual volume and the storage area of the logical volume are made correspondent to one another (so-called mapping is carried out)). The write-targeted data is stored in the storage area acquired from the logical volume. Performing this kind of control makes it possible to enhance data storage efficiency since the storage area allocated from the capacity pool is only the area of the virtual volume capacity into which the data is actually written. The reliability of the logical volume utilized as the capacity pool can also be enhanced by using RAID technology.
As described above, utilizing RAID technology makes it possible for the storage system to receive host I/O even when one of a plurality of physical devices constituting a logical volume fails, and furthermore, a RAID restore process makes it possible to return to a high state of reliability by replacing the failed physical device and writing the data that was stored in this physical device to the post-replacement physical device.
However, the following problems exist in the RAID restore process employed by RAID technology. To make it easier to understand the explanation, RAID 1, that is, one logical volume constituted by two physical devices, will be considered below.
When one physical device fails, the storage system uses the remaining one physical device to process a host-issued I/O request to the logical volume. When the failed physical device is replaced, the storage system, in addition to processing the I/O request issued from the host, copies all of the data from the remaining one physical device to the replaced new physical device, thus returning to the redundant state. Since the storage system is not in a redundant state until the data-copy to the new physical device is complete, the state of reliability is low. The storage capacity of physical devices has increased greatly in recent years, and the time required for a RAID restore process has increased pursuant thereto. Therefore, the problem is that a low state of reliability continues for a long time when a physical device fails even when RAID technology is used.