1. Technical Field
The present invention relates to data storage and retrieval generally and more particularly to a method and system for mirror storage element resynchronization in a storage virtualization device.
2. Description of the Related Art
Information drives business. Companies today rely to an unprecedented extent on online, frequently accessed, constantly changing data to run their businesses. Unplanned events that inhibit the availability of this data can seriously damage business operations. Additionally, any permanent data loss, from natural disaster or any other source, will likely have serious negative consequences for the continued viability of a business. Therefore, when disaster strikes, companies must be prepared to eliminate or minimize data loss, and recover quickly with useable data.
Storage virtualization is one technique used to minimize data loss and to improve the flexibility, accessibility, and availability of data storage. Storage virtualization is the pooling of physical storage from multiple storage devices into what appears from a user or user application perspective to be a single storage device. Storage virtualization is often used as part of a storage area network (SAN). A virtual storage device appears as one storage device, regardless of the types of storage devices (e.g., hard disk drives, tape drives, or the like) pooled to create the virtualized storage device. Storage virtualization may be performed in a host data processing system, a SAN fabric, or in storage devices directly.
Mirrored storage or “mirroring” is another commonly used storage technique. In a mirrored storage environment, a user is presented with a data volume made up of or associated with a number of mirror storage elements. Each write operation or “update” to the data volume is converted (e.g., by a volume manager or the like) into a write operation to each of the mirror storage elements such that all the data of the data volume is replicated onto each of the mirror storage elements. Mirrored storage provides decreased latency for “read” input/output (I/O) operations as a number of read operations can be distributed evenly over all available mirror storage elements, thus increasing available bandwidth and I/O rate in proportion to the number of mirror storage elements. Mirrored storage further provides increase reliability by providing redundant storage of data. Since there are one or more duplicate copies of every block of data, the failure of a disk associated with only one mirror storage element will not cause the data of the data volume to become unavailable.
Mirrored storage does suffer from a number of drawbacks. While read I/O operation latency is decreased by using mirrored storage, “write” I/O operation or “update” latency is actually increased. As each write operation must be translated into “n” physical write operations where “n” is the number of mirror storage elements associated with a volume, write operation latency will be slightly greater than that experienced with a write to an un-mirrored data volume even where all write operations can be initiated concurrently. Mirrored storage also requires “n” times the physical storage of an un-mirrored data volume without providing any increase in storage capacity.
Another significant problem associated with mirrored storage is that mirror storage elements may become inconsistent, for example, following a power failure of underlying physical storage devices, or the failure of hardware or software associated with an element responsible for generating and performing the additional write operations to mirror storage elements of a data volume (e.g., a volume manager or its underlying hardware system). Consider for example a data volume associated with two mirror storage elements. A write operation is issued to a volume manager, which converts it into two concurrent write operations, one for each mirror storage element. If at least one of the concurrent write operations fail while at least one succeeds for any reason (e.g., due to intervening read operations coupled with a failure of a volume manager or its underlying hardware) synchronization between the different mirror storage elements will be lost.
When synchronization between two mirror storage elements is lost, i.e., the mirror storage elements are “out of synch” with respect to one another, each mirror storage element contains different data for at least one region impacted by a failed write. Consequently, a read operation to the region could return either “old” or “new” data (i.e., data with or without the failed write operation applied) depending on which mirror storage element is used to service or process the read. Moreover, repeated read operations may return different data without an intervening write operations depending on what algorithm is used to select the mirror storage element from which a read is serviced. In this case the mirror storage elements are said to be out of sync with respect to one another and their associated data volume(s) must be recovered by “resynchronizing” all mirror storage elements before being used effectively again.
One technique for resynchronizing mirror storage elements during mirrored data volume recovery is to prevent any access, or at least any read access, to a data volume associated with mirror storage elements until all data from one mirror storage element is copied into all others. While guaranteeing that consistent data is returned upon its completion, this technique may take an unacceptably long period of time depending on the size of the data volume, the number of mirror storage elements, and the type of data stored. Another technique for resynchronizing mirror storage elements is to put an associated data volume in so-called “resync” or “read/writeback” mode. Once a read/writeback mode is entered, any read operation to a data volume will be serviced from one of the data volume's mirror storage elements, with the data of the read operation being copied to each of the other associated mirror storage elements. Accordingly, read operation performance is at least somewhat degraded due to the additional copying operations. Data of the selected mirror storage element may additionally be copied to each of the remaining mirror storage elements using a background or opportunistic process. Once all mirror storage elements are synchronized, the read/writeback mode may be exited and read operations handled normally.
Another technique utilized in the resynchronization of mirror storage elements independent of or in conjunction with read/writeback mode is so-called “dirty region logging” (DRL). Using DRL, when a write operation is received for a region of a data volume being mirrored, a “dirty” indicator is associated with the region. After the write operation's data is applied to each mirror storage element of the data volume, the dirty indicator is removed (e.g., using a least-recent algorithm or the like). Consequently, only those regions which are potentially out of synch with respect to one another and so identified as “dirty” need be resynchronized and the amount of downtime (where the data volume is made unavailable until all mirror storage elements are resynched) or the duration of time subject to degraded performance associated with read/writeback mode is reduced.
In host-based storage virtualization systems, read/writeback mode and DRL may be implemented relatively easily. In other storage virtualization environments however, a number of difficulties have arisen with the attempted use of such techniques. In switch-based storage virtualization systems for example, virtualization operations may be partitioned between an input/output module including specialized hardware (e.g., an application specific integrated circuit or “ASIC” or a proprietary architecture processor, or the like) and a control module including generalized hardware in combination with software (e.g., a general purpose processor). In operation, such bifurcated storage virtualization devices typically service the majority of input/output (e.g., read and write operation) requests and perform any associated translation between virtual and physical addresses, using only an I/O module. More complex tasks, (e.g., configuring the I/O module to perform address translation, performing dirty region logging, or the like) are then usually performed by a control module. Consequently, read operations can be typically performed to an unmirrored data volume very quickly by the specialized hardware of an I/O module without the intervention of a slower, more general-purpose control module.
When data volume recovery is implemented in such bifurcated storage virtualization environments however, all I/O operations including read operations necessitate a transition between I/O and control modules (e.g., to perform DRL for writes and additional processing such as copying data to multiple mirror storage elements in read/writeback mode or an examination of a control module-maintained DRL for reads) through a “fault” mechanism which causes an interrupt and context switch and consequently a delay of the ordinary processing of I/O operations. As the processing of I/O operations in such bifurcated virtualization systems, including read operations, always requires a fault or other delay or alternatively the inaccessibility of a data volume until recovery/resynchronization can be completed, substantial latency may be added.