The need to store digital files, documents, pictures, images and other data continues to increase rapidly. In connection with the electronic storage of data, various data storage systems have been devised for the rapid and secure storage of large amounts of data. Such systems may include one or a plurality of storage devices that are used in a coordinated fashion. Systems in which data can be distributed across multiple storage devices such that data will not be irretrievably lost if one of the storage devices (or in some cases, more than one storage device) fails are also available. Systems that coordinate operation of a number of individual storage devices can also provide improved data access and/or storage times. Examples of systems that can provide such advantages can be found in the various RAID (redundant array of independent disks) levels that have been developed. Whether implemented using one or a plurality of storage devices, the storage provided by a data storage system can be treated as one or more storage volumes.
In order to facilitate the availability of desired data, it is often advantageous to maintain different versions of a data storage volume. Indeed, data storage systems are available that can provide at least limited data archiving through backup facilities and/or snapshot facilities. The use of snapshot facilities greatly reduces the amount of storage space required for archiving large amounts of data.
Snapshots provide a versatile feature that is useful for data recovery operations, such as backup and recovery of storage elements. However, traditional snapshots are read-only accessible and their contents cannot be modified, thereby rendering their use somewhat limited, particularly for operating systems and applications that do not have a notion of a read-only data store (e.g., a read-only file system) and that expect to write metadata at any time that the file system is accessible. When a storage element that is held in a snapshot is exported to a client or host and contains the data for such a problematic file system, an issue arises in that the host may attempt to write data to the read-only image. This is a fundamental issue in the design of a reliable system for backups. In general, once a backup image is made via a mechanism like a sparse snapshot, that image should be maintained as a point-in-time representation of the storage volume. A controller typically modifies snapshot data by what is known as a copy-on-write (COW) operation. The COW operation determines when a change to a storage volume is going to occur and then determines if the targeted blocks of that storage volume have changed since a snapshot was taken. If the blocks have not changed since the snapshot was taken, then the controller proceeds by copying the original contents of those blocks and writing them to the snapshot data prior to changing the storage volume. The COW operation ensures that the data from the storage volume at the point-in-time that a snapshot was taken either resides on the storage volume or on the snapshot. The controller therefore changes the snapshot only when doing so is required to preserve the data that was on the storage volume at the time the snapshot was taken, but that will be overwritten on the storage volume.
Direct modification of a snapshot image (e.g., direct modification by a client or host rather than a controller performing a COW operation) could have serious consequences. Such consequences may include the fact that the data of the snapshot is no longer a point-in-time copy and a consistent image of the storage volume may no longer be available for subsequent recovery operations. Accordingly, most snapshot facilities do not allow a host to write data directly to a snapshot without modifying the original data of the snapshot. Thus, many snapshot applications must be used for backup purposes only, where the snapshot cannot be mounted for write access or modification purposes. Furthermore, in some operating systems, a snapshot cannot be mounted as read-only.
The reason for this limited functionality is that traditional sparse snapshot designs employ a single data area to contain all data that is preserved in the snapshot or written to the snapshot by a host system. Although this spare snapshot configuration helps reduce the amount of memory used by a snapshot it severely limits the functionality available to snapshot applications. It would be useful to have a snapshot that provides enhanced snapshot functionality while making efficient use of data storage space.