1. Field of the Invention
This invention relates to data storage in general and, more particularly, to file system based redundant storage consistency recovery.
2. Description of the Related Art
Modern distributed shared storage environments may include multiple storage objects connected via one or more interconnection networks. The interconnection networks provide the infrastructure to connect the various elements of a distributed shared storage environment. Within the storage environment, file system abstractions may be built on top of multiple storage objects. These storage objects may be physical disks or storage aggregations, like logical volumes that distribute data across multiple storage devices. As the number of logical volumes and file system abstractions grows, the complexity of the entire storage environment grows dramatically.
Storage systems frequently use data redundancy mechanisms to ensure data integrity, consistency, and availability. Other uses for data redundancy may include backing up data, distributed load sharing, disaster recovery, or point-in-time analysis and reporting. When keeping redundant data in mirrored volumes, a storage system may duplicate data written to one mirror to all other mirrors. In other words, a storage system may duplicate data written to one copy of a data block stored in a volume to all other copies of that data block stored in that volume. Frequently this copying is done synchronously when the data I/O is preformed. Sometimes, however this mirroring may be performed asynchronously. When keeping redundant data in Redundant Arrays of Independent Disks (RAID) volumes, data may be striped across several devices (columns), and rather than store a complete additional copy of the data, one or more parity values may be calculated for sub-ranges of that data and stored with the data. On failure of any one device (or more than one device in some RAID implementations), parity may be used to reconstruct the data stored on the failed device. Mirroring is a low-order version of RAID (RAID 1).
Under some failure conditions, volumes including redundancy data may require consistency recovery (sometimes called synchronization or “resilvering” for mirrored volumes). For example, a host may crash during a write to a mirrored volume, or a component in the interconnect infrastructure for one of the mirrored devices may fail. This may result in data being written to some of the mirrors but not others, leaving the volume in an inconsistent state. That is, multiple reads of the same block from the volume may end up being routed to different mirrors and thus returning different data, possibly causing serious data corruption. In such situations, a consistency recovery operation may need to be performed to resynchronize the data contents and state of mirrored storage devices. One well known mirror synchronization method involves copying the entire contents of one data mirror of a volume to all other mirrors of that volume, such that all mirrors have the same data contents. This process can take a very long time in even modestly sized storage configurations. To reduce the impact of mirror consistency recovery, another well-known consistency recovery method involves maintaining a bitmap of in-progress I/Os, sometimes called “scoreboarding” or “dirty region mapping.” Every bit in this bitmap represents a region of one or more blocks of the volume. A bit in this map is set, or “dirtied”, when an I/O to the volume is issued and cleared after the I/O has completed for all mirrors. Recoverability and correctness require that the write, or “flush”, of a dirtied bitmap must complete before the write to the data blocks can proceed. To reduce overhead on the data writes, cleaning of dirty bits can be delayed and performed asynchronously without impacting correctness. The size of the region mapped by each bit impacts the write I/O performance, requiring fewer bitmap writes when each bit represents more data blocks. However, the larger the number blocks represented by a single bit in the map, the larger the number of blocks required to be copied during consistency recovery. Copying blocks that are mapped by a dirty bit in the scoreboard, but in fact were not being written, may significantly increase the time taken by the recovery.