The need to store digital files, documents, pictures, images and other data continues to increase rapidly. In connection with the electronic storage of data, various data storage systems have been devised for the rapid and secure storage of large amounts of data. Such systems may include one or a plurality of storage devices that are used in a coordinated fashion. Systems in which data can be distributed across multiple storage devices such that data will not be irretrievably lost if one of the storage devices (or in some cases, more than one storage device) fails are also available. Systems that coordinate operation of a number of individual storage devices can also provide improved data access and/or storage times. Examples of systems that can provide such advantages can be found in the various RAID (redundant array of independent disks) levels that have been developed. Whether implemented using one or a plurality of storage devices, the storage provided by a data storage system can be treated as one or more storage volumes.
In order to facilitate the availability of desired data, it is often advantageous to maintain different versions of a data storage volume. Indeed, data storage systems are available that can provide at least limited data archiving through backup facilities and/or snapshot facilities. The use of snapshot facilities greatly reduces the amount of storage space required for archiving large amounts of data.
Snapshots provide a versatile feature that is useful for data recovery operations, such as backup and recovery of storage elements. However, traditional snapshots are read-only accessible and their contents cannot be modified, thereby rendering their use somewhat limited, particularly for operating systems and applications that do not have a notion of a read-only data store (e.g., a read-only file system) and that expect to write metadata at any time that the file system is accessible. When a storage element that is held in a snapshot is exported to a client or host and contains the data for such a problematic file system, an issue arises in that the host may attempt to write data to the read-only image. This is a fundamental issue in the design of a reliable system for backups. In general, once a backup image is made via a mechanism like a snapshot, that image should be maintained as a point-in-time representation of the storage volume. A controller typically modifies snapshot data by what is known as a copy-on-write (COW) operation. The COW operation determines when a change to a storage volume is going to occur and then determines if the targeted blocks of that storage volume have changed since a snapshot was taken. If the blocks have not changed since the snapshot was taken, then the controller proceeds by copying the original contents of those blocks and writing them to the snapshot data prior to changing the storage volume. The COW operation ensures that the data from the storage volume at the point-in-time that a snapshot was taken either resides on the storage volume or on the snapshot. The controller therefore changes the snapshot only when doing so is required to preserve the data that was on the storage volume at the time the snapshot was taken, but that will be overwritten on the storage volume.
On the other hand, direct modification of a snapshot image (e.g., direct modification by a client or host rather than a controller performing a COW operation) could have serious consequences. Such consequences may include the fact that the data of the snapshot is no longer a point-in-time copy and a consistent image of the storage volume may no longer be available for subsequent recovery operations. Accordingly, most snapshot facilities do not allow a host to write data directly to a snapshot, because doing so will change the point-in-time representation of that snapshot. Thus, most snapshots are limited to read-only operations.
A relatively recent advance in backup facilities is the ability to “clone” an existing snapshot, and perform a backup of the clone instead of from the active file system. With this type of file system, the file server is allowed to remain on-line during the backup. A clone of a snapshot is generally intended to represent the same point-in-time as the snapshot from which the clone originated. Accordingly, clones have the same read-only restrictions imposed on them as their parent snapshots and those that do not have read-only restriction imposed thereon cannot guarantee that a snapshot or its clone actually represent the point-in-time that the snapshot was taken. Another drawback to current cloning systems is that the creation of a clone may take a significant amount of time to complete because most cloning systems create a complete block-by-block copy of the snapshot for the clone. This complicates the creation of a single clone and all but precludes the creation of multiple clones of the same storage volume. The result is that clones tend to be used, one at a time, for short term operations and then are deleted.