1. Field of the Invention
This invention relates to data replication systems in general and, more particularly, to a method and apparatus for consistency interval replication snapshots in a distributed storage environment.
2. Description of the Related Art
Modern distributed shared storage environments may include multiple storage objects connected via one or more interconnection networks. The interconnection networks provide the infrastructure to connect the various elements of a distributed shared storage environment. Within the storage environment, file system abstractions may be built on top of multiple storage objects. These storage objects may be physical disks or storage aggregations, like logical volumes that distribute data across multiple storage devices. As the number of logical volumes and file system abstractions grows, the complexity of the entire storage environment grows dramatically.
Storage systems frequently use data redundancy mechanisms to ensure data integrity, consistency, and availability. Other uses for data redundancy may include backing up data, distributed load sharing, disaster recovery, or point-in-time analysis and reporting. One approach to data redundancy is to copy or replicate data from a primary storage system to a second or replicated storage system. In other words, a storage system may duplicate data written to the primary copy of a data block to redundant or replicated copies of that data block in other, secondary storage systems. In some designs, this copying is done synchronously when the data I/O is preformed. In other designs, this replication may be performed asynchronously with the second storage system's data state lagging the primary storage state by a time interval that can be anywhere from fractions of a second to many hours, depending on the design objectives and technologies used.
Under some failure conditions, volumes that contain redundant data may require consistency recovery. For example, a host may crash during a write to a volume, or a component in the interconnect infrastructure may fail. This may leave the volume in an inconsistent state. For example, if the volume is mirrored to protect against data loss due to a single disk failure, and stores two or more complete copies of the data, a system crash during a write may leave data copies in different states and with different contents. In such situations, a consistency recovery operation may need to be performed to resynchronize the data contents and state of mirrored storage devices. One well-known synchronization method involves copying the entire contents of one data copy to another, such that all copies of data in a redundant volume have the same data contents. This process can take a very long time in even modestly sized storage configurations. To reduce the impact of consistency recovery, another well-known consistency recovery method involves maintaining a bitmap of in-progress I/Os, sometimes called “scoreboarding” or “dirty region mapping.” Every bit in this bitmap represents a region of one or more blocks of the volume. A bit in this map is set, or “dirtied”, when an I/O to the volume is issued and cleared after the I/O has completed.