Typical computer systems include a file system for storing and accessing files. In addition to storing system files (operating system files, device driver files, etc.), the file system provides storage and access of user data files. If any of these files (system files and/or user files) contain critical data, then it becomes advantageous to employ a data backup scheme to ensure that critical data are not lost if a file storage device fails. One data backup scheme that is commonly employed is mirroring. Mirroring involves maintaining two or more copies of a file, where each copy of the file is located on a separate file storage device (e.g., a local hard disk, a networked hard disk, a network file server, etc.).
When one or more file storage devices fails for any length of time, the file storage device(s) may become unsynchronized. However, when employing a mirroring scheme, it is of critical importance to ensure that the mirrors are synchronized (i.e., that the contents of each mirror are the same). If a mirror becomes unsynchronized, the simplest recovery scheme involves copying all of the data from a synchronized mirror to the unsynchronized mirror. However, copying all data from one file storage device to another file storage device may take a long time and reduce performance of the file storage devices significantly during the resynchronization process.
Alternatively, dirty region logging (DRL) may be used to facilitate resynchronization. DRL involves dividing each mirror into a number of “regions.” Depending on the implementation, the region may be as small as a single disk sector or larger than 256 kilobytes (KB). Prior to modifying the content of a region—for example, when there is a write operation on data within the region—a DRL entry for the region is created in the DRL. In most cases, the DRL entry merely identifies the region where the modification will be attempted. If the region is modified successfully, then the DRL entry is cleared. If the region is not modified successfully, then the DRL entry remains in the DRL. Thus, during a resynchronization process, the DRL may be used to identify which specific regions require resynchronization, rather than resynchronizing the entire file storage device.
Dirty region logging may be more time-efficient then resynchronizing an entire file storage device. However, it also includes system overhead with each modification to a region, since the DRL must be updated prior to each modification to the region. Clearly, this overhead increases with smaller region sizes. Conversely, if the regions are large, there may be significant overhead involved in resynchronizing an entire region, even though only a single disk sector in that region may have been modified.