Field of the Invention
The present invention relates to a technology for recovery control in mirrored disks, and in particular to a technology for improving recovery time of mirrored disks when read stability is in doubt.
Description of the Related Art
In storage systems an array of independent storage devices can be configured to operate as a single virtual storage device using a technology known as RAID (Redundant Array of Independent Disks—earlier known as Redundant Array of Inexpensive Disks). A computer system configured to operate with a RAID storage system is able to perform input and output (I/O) operations (such as read and write operations) on the RAID storage system as if the RAID storage system were a single storage device. A RAID storage system includes an array of independent storage devices and a RAID controller. The RAID controller provides a virtualised view of the array of independent storage devices—this means that the array of independent storage devices appear as a single virtual storage device with a sequential list of storage elements. The storage elements are commonly known as blocks of storage, and the data stored within them are known as data blocks. I/O operations are qualified with reference to one or more blocks of storage in the virtual storage device. When an I/O operation is performed on the virtual storage device the RAID controller maps the I/O operation onto the array of independent storage devices. In order to virtualise the array of storage devices and map I/O operations the RAID controller may employ standard RAID techniques that are now well known in the art.
In a non-RAID computer system, if a disk drive fails, all or part of the stored customer data may be permanently lost (or possibly partially or fully recoverable but at some expense and effort). Although backup and archiving devices and procedures may preserve all but the most recently saved data, there are certain applications in which the risk of any data loss and the time required to restore data from a backup copy is unacceptable. Therefore, RAID (“redundant array of inexpensive disks”) storage subsystems are frequently used to provide improved data integrity and device fault tolerance.
Storage subsystems thus aim to provide continuous data availability and data integrity. One solution that aims to increase availability is RAID-1 which is also commonly known as mirroring. Mirroring maintains two or more copies of the data and when one copy is unavailable then the other, or another, copy is used to allow I/O to continue thus improving availability over the case where only a single copy exists and is unavailable.
To maintain the mirror each write I/O must be performed to each copy. An I/O failure could occur before all write I/Os to all copies have completed which could result in the mirror having different data on different copies. In these situations it is important that the storage system maintains read stability (which is defined to mean that every read I/O to the same area should return the same data if no intervening writes have occurred) which means that the copies must be restored to a state where they have identical data.
RAID-1 mirroring solutions typically have methods to store metadata to record writes in flight that can be used to replay write I/Os after a system failure (such as a reset). After the writes in flight have been replayed read stability is restored.
More severe system failures can mean all ability to replay writes in flight has been lost. In these situations there is no way to determine what part of the mirror copy data is identical. In these circumstances read stability can be restored by choosing any one mirror copy as a source and copying all its contents to the other target copies (commonly referred to as synchronizing the copies). This situation when the mirror read stability is in doubt is different from the case where one copy has the correct data and another copy does not contain the same data (because it could not be written to for some reason), recovery from this situation using synchronisation is provided by systems known in the art.
As soon as the source copy is available data availability can be restored as this copy has the correct data that can be read. When the source and all target copies are available and after the synchronisation process has successfully completed, read stability across the mirror copies has been restored and therefore all mirror copies are usable and mirror redundancy has been restored.
However, until the source copy is available the mirror is unavailable and the synchronisation process to restore the mirror redundancy cannot be started. This means existing solutions can take a considerable time to recover the mirror availability and its redundancy.
It would thus be desirable to have an improved technological means for recovery control in mirrored disks, and in particular to have a technology for improving recovery time of mirrored disks when read stability is in doubt.