It is desirable during the operation of a mass storage system to periodically gather information about how the data is stored on the system and from time-to-time to make a backup copy of the stored data. Gathering such information can be beneficial for a number of reasons, including for recovery in the event of a non-recoverable failure.
Backing up a mass storage system is typically done by reading the data stored on the mass storage system and writing it to a magnetic tape to create an archive copy of the stored data.
However, generating such archival copies can be burdensome. Many prior art backup methods require that the system be removed from ongoing (online) operations to assure the integrity and consistency of the backup copy. This is because normal backup techniques either copy the blocks from the mass storage system sequentially to a linear-access tape, or walk through the file system on the mass storage system, starting with the first block of the first file in the first directory and proceeding in order to the last block of the last file of the last directory. In either case, the backup process is unaware of updates being performed as data is being written to tape.
Thus, to permit continued, online operations while performing backup operations generates inconsistencies if the data is modified as the backup operation proceeds. Removing the storage system from continued storage operations eliminates the risk of inconsistencies arising during the system operations. However, backup operations can be time consuming therefore making removal of the system from operations undesirable.
One approach to addressing this problem, has been by creating a mirror, or identical copy, of one disk's data. When a backup operation is required, the mirror disk may be used as a static image for a storage. When the static image is no longer necessary (for example, when the tape backup has been completed), the two disks are resynchronized, by copying any changes made during the time mirroring was not active to the mirror disk, and mirroring is resumed.
Although, mirroring works well, it requires that the data stored on the system be captured accurately. Today however, new distributed storage systems are being developed that avoid the use of a centralized storage control system. These distributed systems capture the benefits of the more flexible and scalable distributed server architectures. Although very exciting, these storage systems present challenges that prior art storage systems do not. One such challenge is the ability to generate reliable and trustworthy archive copies of a data volume that has been distributed across a plurality of independently operating servers.
Accordingly, there is a need in the art for a distributed storage system that can provide reliable snapshots of the data volumes that are being maintained across the different server in the system.