The present invention is directed to methods and programs for computer storage systems conventionally implemented in disk drive storage and, more particularly, to stored data recovery by resynchronization of stored mirrored logical data volumes after storage system failures or like problems.
In the current data processing environment, there has been a dramatic increase in the availability and capacity of computer storage systems, such as hard disk drives and optical drives. Present storage systems associated with workstations may have conventional capacities up to hundreds of gigabytes. However, because of these increased capacities, problems have arisen in storage system recovery after a system failure or a like problem. This is particularly the case in storage systems which use mirrored stored logical data volumes. Mirroring is the implementation where the operating system makes a plurality of copies of data (usually duplicate or triplicate copies) in order to make data recovery easier in the event of a system failure or a similar problem. However, all mirrored storage systems require a system resynchronization after a failure. This will resynchronize all noncurrent physical volume partitions used in the mirroring to represent the logical volume partitions of the logical volume group.
By way of background, most AIX(trademark) and UNIX(trademark) based operating systems use some form of stored data mirroring. A basic storage system may be considered to be a hierarchy managed by a logical volume manager and made up of logical volume groups, which are in turn made up of a plurality of logical volumes which are physically represented by physical volumes on the actual disk or hard drives. Each physical volume is divided into physical partitions which are equal size segments on a disk, i.e. the actual units of space allocation. Data on logical volumes appears to be contiguous to the user but can be noncontiguous on the physical volume. This allows file systems and other logical volumes to be resized and relocated, span multiple physical volumes and have their contents replicated for greater flexibility and availability in the storage of data. In mirrored systems, a logical volume is divided into a plurality of mirrored logical data partitions, i.e. each logical volume has two or three redundant partitions therein. Such logical and physical volumes are generally described in the text, AIX 6000 System Guide, Frank Cervone, McGraw-Hill, N.Y., 1996, pp. 53-56.
In any event, when mirrored logical volumes (LVs) are first brought on-line or initiated, they must be synchronized. In mirrored LVs, each partition of the mirror can have two states: stale or available (unstale). Data may be read from any unstale mirrored partition. On the other hand, in writing, the data must be written to all available (unstale) mirrored partitions before returning. Only partitions that are marked as unstale will be read and written. In synchronization, or in resynchronization, a command such as the AIX xe2x80x9csyncvgxe2x80x9d command is run which copies information from an unstale mirror partition to the stale mirror partition, and changes the partition designation from stale to unstale.
In systems with mirrored partitions, after a system failure, e.g. a hangup or a crash, the LVs must be resynchronized. In present practice, this resynchronization must take place before the storage system may be accessed again; otherwise, the user may get inconsistent data. This is likely to result from xe2x80x9cwritesxe2x80x9d in flight at the time of the crash which may not be completed and which may cause mirrored partitions to have different data. Reference is made to section 6.2.7, pp. 163-164, of the above-referenced Cervone text. Such resynchronization is usually done sequentially, LV by LV, and partition by partition. Because of the increased size of current storage systems and the large size groups of logical data volumes which may be involved in a resynchronization after a storage system, users may be subject to undesirable delays while waiting for the completion of synchronization in order to access the data from storage systems using mirrored volumes.
The present invention overcomes these prior art problems of delays caused by resynchronization in mirrored LV storage systems by providing in systems made up of a plurality of mirrored LVs respectively divided into a plurality of mirrored logical data partitions, a system for dynamically resynchronizing in the event of a storage system problem. Immediately after the correction of the problem causing the failure, means start to resynchronize the plurality of LVs but without waiting for the resynchronization to be completed; means access data from a data partition in a portion of one of said LVs. Then, there are means for determining whether the portion of the LV containing the accessed partition has already been resynchronized prior to access, together with means responsive to these determining means for replacing data in the other mirrored partitions corresponding to the accessed data with the accessed data in said accessed partition in the event that the LV has not been resynchronized. The means for replacing the data in the other mirrored partitions in the LV containing the accessed partition may replace the data prior to resynchronization of the LV or it may replace the data during the subsequent resynchronization of the LV. In the implementation where the data in the other mirrored partitions are replaced during resynchronization, there is provided interim means responsive to the accessing of data from the data partition in said LV for indicating the partition as accessible and for indicating the other mirrored partitions in the LV as inaccessible, in combination with means for removing the indicators from said partitions upon resynchronization of said accessed set. In one embodiment, the means for indicating the partition as accessible is an unstale data indicator, and the means for indicating the other mirrored partitions as inaccessible is a stale data indicator.
The system preferably indicates whether a partition in a LV has been resynchronized. This may be done by a combination of means responsive to a storage system failure for setting a resynchronization indicator for each partition in said LVs, and means for removing said resynchronization indicator from each LV partition upon the resynchronization.
In the description of the present invention, we will refer to accessing data from a logical data partition and copying such accessed data from the accessed partition to its mirrored partition. It should be understood that the accessed data may be a data block which constitutes only a small portion of the accessed partition or its mirrored partition. Consequently, in the embodiment where the accessed data is copied prior to resynchronization, the accessed portion and its mirrored copy will be recopied along with the unaccessed data in the standard resynchronization process step for the whole mirrored data partition. In such a case, the initially copied data would provide temporary mirrored data consistency prior to resynchronization. Alternatively, a routine could be set up whereby those data portions of the mirrored partitions which are accessed and thus copied prior to resynchronization are tracked and an indicator thereof stored so that during the subsequent resynchronization such already copied portions would not be recopied.