As computers in general, and personal computing devices in particular, become ubiquitous, networks and network service providers are called upon to provide data storage for increasing amounts of data. To accommodate the growing need for data storage, data storage systems must evolve to address competing requirements of increased capacity, faster access time, and failsafe mechanisms such as data backup.
One response to this demand for storage has been the development of data storage devices that behave as if they were a single storage device but are in reality multiple storage devices that collectively operate together. From the perspective of a host, i.e., an entity that makes demands to read, write and/or allocate data storage, these devices appear as a single data storage entity. Examples of such devices include disk storage arrays, network storage arrays, and the like.
A collection of data storage devices acting in concert may be configured to operate in a way that provides failsafe measures. For example, multiple disks may be organized into redundant arrays of inexpensive disks (RAID) groups. RAID groups may provide mirroring or other forms of duplication wherein data written to one disk is also written to another disk as a backup copy. RAID groups may also distribute data across many disks so that if one of the disks fails, there is enough data left on the other disks to reconstruct the missing data. RAID groups may perform a combination of striping and mirroring.
Using the simplest case, mirroring, from the host's perspective, the host is writing to just one device. In reality, however, the same data may be written to multiple devices. In conventional data storage systems, there is usually a controlling entity, such as a device driver, that acts as the intermediary between the host or hosts and the data storage devices. The controlling entity and the multiple data storage devices acting in concert is hereinafter collectively referred to as a data storage subsystem, or DSS.
A common problem that arises in systems that perform mirroring or other forms of backup or duplication is the problem of synchronization. Mirroring systems typically have at least two data storage entities, e.g., two disks, two RAID groups, etc., one to store the data and the other to store the backup copy of the data, commonly referred to as the primary and secondary data storage entities, respectively, or “primary” and “secondary” for brevity. In order to provide a backup copy of data, any data that is stored in the primary must also be stored in the secondary. The primary and secondary are said to be in synchronization with each other (also referred to “in sync”) when the contents of the primary and secondary are the same, e.g., they match each other. If the contents of the primary and secondary do not match each other, the primary and secondary are said to be out of synchronization with each other (also referred to as “out of sync”).
When a write request is sent from a host to a DSS, the DSS will attempt to write the data to both the primary and the secondary. If all of the data is successfully written to both the primary and the secondary, the host is informed that the write succeeded. Sometimes, however, some, but not all, of the data is written to a data storage device, a situation referred to as a “partial write” to that device. If both the primary and secondary were able to store only a portion of the data to be written, for example, the primary and secondary may still be in synchronization with each other despite the fact that a partial write occurred, so long as the primary and secondary both stored the same portion.
However, if all of the data is successfully written to the primary, but not all of the data was written to the secondary, the contents of the secondary do not match the contents of the primary, and the secondary is thus said to be out of synchronization with the primary. Conventional data storage subsystems respond to this situation in various ways.
In some conventional data storage systems, the host will be informed that the write failed. The host must then attempt to perform the write again until the data is successfully written to both the primary and the secondary. This is highly inefficient, since some or all of the data was successfully written to at least one of the data storage entities.
In some conventional data storage systems, the host will be informed that the write was successful, and a synchronization mechanism, separate from the host write, will copy the data from the primary to the secondary. This is an improvement on the method described above, because the host does not need to perform the write again just because the write to the secondary was not successful.
However, when only some of the data is successfully written to the primary—e.g., a partial write to the primary—these conventional storage systems may still exhibit other inefficiencies. For example, if only a portion of the data to be written was successfully stored on the primary, conventional data storage systems may indicate to the host that the write failed, requiring the host to retry until a full, rather than partial, write was performed on the primary. Even where a conventional data storage system allows a partial write to the primary, informing the host that the write was partially successful, for example, conventional data storage systems treat any data that happened to be successfully written to the secondary as invalid data. The synchronization mechanism will copy to the secondary whatever portion of the primary that was successfully (albeit partially) written, regardless of whether the data had already been written successfully to the secondary.
In one example, data to be stored to the data storage subsystem is represented by the string “ABCD”, where each letter corresponds to a portion of the whole amount of data. If only a portion of the data, portion “ABC” for example, was successfully written to the primary, but all of the data was written to the secondary, conventional data storage subsystems may treat the data “ABCD” that was written to the secondary as invalid. A synchronization mechanism or process would then copy “ABC” from the primary to the secondary, despite the fact that the secondary already contained “ABC”, and more (i.e., “D” also).
There are disadvantages associated with the conventional methods described above. First, the synchronization mechanism may be performing significant amounts of unnecessary copying (i.e., the “ABC” in the example above) from the primary to the secondary. This uses system resources, occupies data paths, and can cause delays in other storage functions while the synchronization process labors to synchronize the primary and secondary. These additional burdens placed on the system are collectively referred to as “overhead”.
Second, this overhead, becomes increasingly burdensome as the amount of data stored on data storage subsystems grows. As users of computers and personal computing devices begin to expect access to songs, movies, and other multimedia formats, storage requirements will become very large. Extending the example above, hosts that formerly made write requests to store data “ABCD” may now desire to store data “ABCDEFGHIJKLMNOPQRSTUWXYZ”; the larger the amount of data to be stored, the more likely it is that the data will be partially, rather than completely, stored on the primary. For example, if the primary was able to store A-Y, but the secondary was able to store A-Z, conventional DSSs will mark the entire portion of the secondary as being invalid and schedule the synchronizing process to copy A-Z from the primary to the secondary, even though the secondary already contains the correct data A-Y. Note that the secondary now contains Z that the primary does not, so the old Z must be copied from the primary to overwrite the new Z that was stored on the secondary. From this example it can be seen that enormous inefficiencies are associated with the conventional implementations of data storage subsystems. As additional secondaries are added to the DSS, so will the likelihood that additional synchronization write cycles will increase, thus increasing the inefficiencies.
Third, with the increased popularity of “sparse” data storage, where data storage space is virtually allocated to a host, application, or other entity but where actual storage space is not used until it is needed, an operation known as a “zero-fill write” (e.g., a SCSI WRITESAME with zero specified) becomes popular. Using a zero-fill write, large sections of data storage may be initialized, e.g., by filling the formerly unallocated or previously used space with zeroes. Unlike a normal write request, where the write request is accompanied by the data to be written, a zero-fill write request does not include data to be written: the data to be written is implicit in the request. Zero-fill writes are often used to initialize enormous portions of data storage. Because large portions of the primary are written with zeroes, equally large portions of one or more secondaries must also be synchronized. The inefficiencies described above may be multiplied by orders of magnitude due to the sheer size of a zero-fill write.
Accordingly, in light of these disadvantages associated with synchronization of data storage devices after the occurrence of a partially completed write, there exists a need for systems, methods, and computer readable media for improving synchronization performance after partially completed writes.