The following description is provided to assist the understanding of the reader. None of the information provided is admitted to be prior art.
In data storage architectures, a client's data may be stored in a volume. A unit of data, for example a file (or object), is comprised of one or more storage units (e.g. bytes) and can be stored and retrieved from a storage medium such as disk or RAM in a variety of fashions. For example, disk drives in storage systems are divided into logical blocks that are addressed using logical block addresses (LBAs). As another example, an entire file can be stored in a contiguous range of addresses on the storage medium and be accessed given the offset and length of the file. Most modern file systems store files by dividing them into blocks or extents of a fixed size, storing each block in a contiguous section of the storage medium, and then maintaining a list or tree of the blocks that correspond to each file. Some storage systems, such as write-anywhere file layout (WAFL), logical volume manager (LVM), or new technology file system (NTFS), allow multiple objects to refer to the same blocks, typically through a tree structure, to allow for efficient storage of previous versions or “snapshots” of the file system. In some cases, data for a single file or object may be distributed between multiple storage devices, either by a mechanism like RAID which combines several smaller storage media into one larger virtual device, or through a distributed storage system such as Lustre, General Parallel File System, or GlusterFS.
At some point, it is desirable to backup data of the storage system. Traditional backup methods typically utilize backup software that operates independently of the data storage system and manages the backup process. Backup methods exist to backup only the differences since the last full backup (e.g., a differential backup) or to backup only the changes since the last backup (e.g., an incremental backup). However, due to inefficiency of backup software, many administrators are shifting away from traditional backup processes and moving towards data replication methods. With replication comes the issue of replicating a mistake, for example, a wrongly deleted file. High bandwidth is required for both replication and backup solutions, and neither methods are particularly well suited to scale efficiently for long term archiving.