For a variety of reasons, it is possible for data corruption to occur in data storage systems that have support for more than one concurrent I/O operation and that do not synchronize or lock the I/O operations. Generally, this is possible because each I/O operation looks up, and optionally modifies, metadata that may also be used by other, parallel I/O operations. One type of data storage system that is particularly vulnerable to data corruption as the result of unsynchronized parallel I/O operations is a data storage system that utilizes snapshots. A snapshot is a read-only volume that is a point-in-time image of a data storage volume that can be created, mounted, deleted, and rolled back onto the data storage volume arbitrarily. Snapshots are utilized extensively in the data storage industry for security, backup, and archival purposes. Snapshots may also be utilized within data storage systems that utilize thin provisioning to allocate storage space on demand. Space is allocated in units of a provision, while snapshot writes occur in sub-provision units referred to herein as “chunks.”
In a data storage system with active snapshots, a particular chunk may receive two concurrent non-overlapping sub-chunk writes. If the chunk has not received any I/O requests in the current snapshot lifetime but has previously received I/O requests, it is necessary to perform a read-modify-write cycle for the first write. If no synchronization mechanism is present, both sub-chunk I/Os will start independent read-modify-write cycles, unaware that another I/O operation is operating on the same chunk. As a result, both operations will be converted to inconsistent chunk writes, one of which will overwrite the other. This will lead to data corruption.
I/O operations that are not synchronized may also cause corruption to the metadata of a system that utilizes snapshots. In particular, data storage systems that utilize snapshots typically utilized metadata to indicate the particular lifetime that a chunk was written in. If a certain chunk receives a read operation and a write operation in parallel on two non-overlapping sub-chunk writes, and the write is the first new write to the chunk, and the chunk contains valid data from a previous snapshot lifetime, the read operation may be performed on the wrong provision. This is because when the write is dispatched, a bit in the metadata will be set just prior to the write being completed. If the mapping cycle of the read operation takes place before the bit is set, the new provision will be resolved instead of the old one. However, since the read-modify-write cycle has not yet been completed, the read from the new provision will yield the wrong data, resulting in apparent data corruption. If, alternately, the metadata bit that indicates that a new write has taken place is set only after the write has been completed, other problems may occur. In this case, two write operations to different chunks in the same provision may initiate writes of the metadata with different bits set, without synchronizing the setting of the metadata bits. This, also, yields data corruption.
Background processes may cause data corruption where I/O operations are not synchronized. For instance, a defragmentation thread running as a background process can also cause data corruption. If a background defragmentation read operation and a write operation to the same chunk are dispatched together and the defragmentation read completes first, the defragmented data will become out of date, thereby causing data corruption.
It is with respect to these considerations and others that the present invention has been made.