The prior art has various techniques for recovering or restoring data should a computer system crash or the system otherwise become inoperative indefinitely or permanently. One such technique is mirroring, wherein a computer system maintains identical copies of data. Mirroring is also known as “RAID Level 1,” “disk shadowing,” “real-time copy,” and “t1 copying.”
The original data is the “source”, and its copies are the “mirrors.” Each of the source and the mirrors is a volume. A volume may be a disk, a partition of the disk or a number of blocks of the disk or the partition. The physical extents of a mirror volume reside entirely on a single machine, typically on a single physical disk.
FIG. 1 illustrates a computer system 1 of the prior art. The system 1 may include a host 10, storage elements 11 and communications links 12, 13. The links 12, 13 communicatively couple respective storage elements 11 to the host 10. The links 12, 13 directly connect the storage elements 11 to the host 10.
In operation, the storage element 11s serves as a source, and the storage element 11m serves as a mirror. The host 10 manages the synchronization and resilvering of the mirror 11m as necessary. Notably, all of the physical extents of the mirror 11m are responsibility of the one host 10. Indeed, the physical extents of the mirror are all on the one physical disk 11m. 
FIG. 2 illustrates a computer system 1′ of the prior art. The system 1′ uses a storage array as the mirror 11m. While the physical extents of the mirror 11m are now distributed across multiple physical drives within the array 11m, the physical extents of the mirror 11m are still the responsibility of the one host 10.
FIG. 3 illustrates another computer system 3 of the prior art. The system 3 includes hosts 31, 32, 33, communication links 34, 35 and storage elements 11. The link 34 communicatively couples the hosts 31, 32, 33 while links 35 communicatively couple the hosts 31, 32, 33 to storage elements 11.
In operation, the system 3 designates all or a part of the storage element 11s1 attached to the host 31 as a source—likewise for the storage elements 11s2 and 11s3. Correspondingly, the system 3 designates (all or part of) the storage element 11m1 attached to the host 32 as the mirror for the source 11s1. The mirror 11m2 for the second source 11s2 also attaches to the host 32 while the mirror 11m3 for the third source 11s3 attaches to the third host 33.
Multiple sources 11s1, 11s2, 11s3 on a single host 31 may have mirrors 11m1, 11m2, 11m3 on multiple hosts 31, 32. Notably, the physical extents of any single mirror are still the responsibility of one host.
In the event that a system 1, 1′ or 3 loses access to a mirror 11m, the system continues to serve read and write requests with the source 11s. Should the system instead lose access to the source 11s, the system may serve data requests from the corresponding mirror.
Once a system 1, 1′ or 3 is compromised, system management (a software agent or the system administrator, for example) may seek to repair it. Management may designate another existing, single storage element as the mirror or may physically replace the lost element with a correctly operating one.
A system 1, 1′ or 3 re-silvers the new mirror, typically by copying the data from the source onto the new mirror. The host responsible for the source (the original source or the original mirror, depending on the type of failure) copies data from the source to the new mirror. The copying is done sequentially, block by block. Where the host responsible for the source and the host responsible for the mirror are not the same host, the copying involves forwarding data blocks from the source host to the mirror host.
The data block currently being copied is termed herein the “watermark.” Data that has already been copied to the mirror is below the watermark, and data that has yet to be copied is above the watermark.
While the mirror is being re-silvered, to achieve fault tolerance, a system 1, 1′or 3 keeps the source element 11 being re-silvered available to serve data requests. The system may receive data requests and satisfy those requests from the source element through the host responsible for that source element or, if the read is directed to a block below the watermark, through the synchronized portion of the mirror element.
The system may receive a data write request. If the data to be written is below the watermark, the system writes the data to the source element and to the new mirror element as well. However, where the data to be written is above the watermark, the system writes the data only to the source element. At some later point when the watermark moves over this new data, the system then copies the written data to the new mirror. A write request targeted at the watermark blocks until the watermark rises to another data block.
Where, however, a more sophisticated storage system replaces the storage system described above, this process of resilvering the mirror proves inefficient. For example, consider a storage system wherein the physical extents for a mirror volume are distributed across multiple hosts for efficient serving of data requests. If a single one of the multiple hosts re-silvers the entire mirror, efficiencies gained by the distribution are lost by the consolidation of the re-silvering responsibility in the one host.
Accordingly, there is a need for a storage system that distributes mirrors and does not lose distribution efficiencies during the re-silvering of a mirror.
If the host responsible for the source fails during the re-silvering, the systems of the art cannot complete the re-silvering. Because neither the source nor the mirror is available, the system loses access to the data. Fault tolerance has failed.
There is a also a need for a storage system that retains the efficiencies of distribution during the re-silvering of a mirror and still provides uninterrupted access to the data being re-silvered—even in the face of the failure of a volume participating in the re-silvering the mirror.
Still further, there is a need to efficiently perform initial synchronization of a distributed mirror.
These and other goals of the invention will be readily apparent to one of skill in the art on reading the background above and the description below.