The present invention relates generally to computer data storage, and more particularly, to a snapshot copy facility for a data storage system.
Snapshot copies of a dataset such as a file or storage volume have been used for a variety of data processing and storage management functions such as storage backup, transaction processing, and software debugging.
A known way of making a snapshot copy is to respond to a snapshot copy request by invoking a task that copies data from a production dataset to a snapshot copy dataset. A host processor, however, cannot write new data to a storage location in the production dataset until the original contents of the storage location have been copied to the snapshot copy dataset.
Another way of making a snapshot copy of a dataset is to allocate storage to modified versions of physical storage units, and to retain the original versions of the physical storage units as a snapshot copy. Whenever the host writes new data to a storage location in a production dataset, the original data is read from the storage location containing the most current version, modified, and written to a different storage location. This is known in the art as a xe2x80x9clog structured filexe2x80x9d approach. See, for example, Douglis et al. xe2x80x9cLog Structured File Systems,xe2x80x9d COMPCON 89 Proceedings, Feb. 27-Mar. 3, 1989, IEEE Computer Society, p. 124-129, incorporated herein by reference, and Rosenblum et al., xe2x80x9cThe Design and Implementation of a Log-Structured File System,xe2x80x9d ACM Transactions on Computer Systems, Vol. 1, February 1992, p. 26-52, incorporated herein by reference.
Yet another way of making a snapshot copy is for a data storage system to respond to a host request to write to a storage location of the production dataset by checking whether or not the storage location has been modified since the time when the snapshot copy was created. Upon finding that the storage location of the production dataset has not been modified, the data storage system copies the data from the storage location of the production dataset to an allocated storage location of the snapshot copy. After copying data from the storage location of the production dataset to the allocated storage location of the snapshot copy, the write operation is performed upon the storage location of the production dataset. For example, as described in Keedem U.S. Pat. No. 6,076,148 issued Jun. 13, 2000, assigned to EMC Corporation, and incorporated herein by reference, the data storage system allocates to the snapshot copy a bit map to indicate storage locations in the production dataset that have been modified. In this fashion, a host write operation upon a storage location being backed up need not be delayed until original data in the storage location is written to secondary storage.
Backup and restore services are a conventional way of reducing the impact of data loss from the network storage. To be effective, however, the data should be backed up frequently, and the data should be restored rapidly from backup after the storage system failure. As the amount of storage on the network increases, it is more difficult to maintain the frequency of the data backups, and to restore the data rapidly after a storage system failure.
In the data storage industry, an open standard network backup protocol has been defined to provide centrally managed, enterprise-wide data protection for the user in a heterogeneous environment. The standard is called the Network Data Management Protocol (NDMP). NDMP facilitates the partitioning of the backup problem between backup software vendors, server vendors, and network-attached storage vendors in such a way as to minimize the amount of host software for backup. The current state of development of NDMP can be found at the Internet site for the NDMP organization. Details of NDMP are set out in the Internet Draft Document by R. Stager and D. Hitz entitled xe2x80x9cNetwork Data Management Protocolxe2x80x9d document version 2.1.7 (last update Oct. 12, 1999) incorporated herein by reference.
In accordance with one aspect of the invention, there is provided a data storage system for providing access to a production dataset and at least one snapshot dataset. The snapshot dataset is the state of the production dataset at a point in time when the snapshot dataset was created. The data storage system includes storage for storing data of the production dataset and the snapshot dataset. The data storage system is programmed for maintaining an indication of invalid blocks of the storage that are allocated to the production dataset. The data storage system is also programmed for performing a write access upon a specified block of the storage allocated to the production dataset by checking whether or not the specified block is indicated to be invalid, and if the specified block is not indicated to be invalid, copying the specified block to the snapshot dataset and then writing to the specified block, and if the specified block is indicated to be invalid, writing to the specified block without copying the specified block to the snapshot dataset.
In accordance with another aspect, the invention provides a data storage system for providing access to a production dataset and a plurality of snapshot datasets. Each snapshot dataset is the state of the production dataset at a point in time when the snapshot dataset was created. The data storage system includes storage for storing data of the production dataset and the snapshot datasets. The data storage system is programmed for maintaining an indication of invalid blocks of the storage that are allocated to the production dataset, and for maintaining, for each snapshot dataset, a snapshot copy of the indication of invalid blocks of the storage that were allocated to the production dataset at the point of time when the snapshot dataset was created. The data storage system is also programmed for performing a write access upon a specified block of the storage allocated to the production dataset by checking whether or not the specified block is not indicated to be invalid in any of the snapshot copies of the indication of invalid blocks that were allocated to the production dataset at the point in time when each snapshot dataset was created, and if the specified block is not indicated to be invalid in any of the snapshot copies of the indication of invalid blocks that were allocated to the production dataset at the point in time when each snapshot dataset was created, copying the specified block to at least the most recent snapshot dataset and then writing to the specified block, and if the specified block is indicated to be invalid in the production dataset and in all of the snapshot copies of the indication of invalid blocks that were allocated to the production dataset at the point in time when each snapshot dataset was created, writing to the specified block without copying the specified block to at least the most recent snapshot dataset.
In accordance with still another aspect, the invention provides a data storage system for providing access to a production dataset and a plurality of snapshot datasets. Each snapshot dataset is the state of the production dataset at a point in time when the snapshot dataset was created. The data storage system includes storage for storing data of the production dataset and the snapshot datasets. The data storage system is programmed for maintaining a meta bit map indicating invalid blocks of the storage that are allocated to the production dataset, and for maintaining, for each snapshot dataset, a snapshot copy of the meta bit map indicating invalid blocks of the storage that were allocated to the production dataset at the point of time when the snapshot dataset was created. The data storage system is further programmed for using the snapshot copies of the meta bit map for deciding whether or not to copy blocks from the storage of the production dataset to storage of the snapshot datasets for saving the blocks to support the snapshot datasets.
In accordance with yet another aspect, the invention provides a method of operating a data storage system for providing access to a production dataset and at least one snapshot dataset. The snapshot dataset is the state of the production dataset at a point in time when the snapshot dataset was created. The data storage system includes storage for storing data of the production dataset and the snapshot dataset. The method includes maintaining an indication of invalid blocks of the storage that are allocated to the production dataset. The method further includes performing a write access upon a specified block of the storage allocated to the production dataset by checking whether or not the specified block is indicated to be invalid, and if the specified block is not indicated to be invalid, copying the specified block to the snapshot dataset and then writing to the specified block, and if the specified block is indicated to be invalid, writing to the specified block without copying the specified block to the snapshot dataset.
In accordance with yet still another aspect, the invention provides a method of operating a data storage system for providing access to a production dataset and a plurality of snapshot datasets. Each snapshot dataset is the state of the production dataset at a point in time when the snapshot dataset was created. The data storage system includes storage for storing data of the production dataset and the snapshot datasets. The method includes maintaining an indication of invalid blocks of the storage that are allocated to the production dataset, and for maintaining, for each snapshot dataset, a snapshot copy of the indication of invalid blocks of the storage that were allocated to the production dataset at the point of time when the snapshot dataset was created. The method further includes performing a write access upon a specified block of the storage allocated to the production dataset by checking whether or not the specified block is not indicated to be invalid in any of the snapshot copies of the indication of invalid blocks that were allocated to the production dataset at the point in time when each snapshot dataset was created, and if the specified block is not indicated to be invalid in any of the snapshot copies of the indication of invalid blocks that were allocated to the production dataset at the point in time when each snapshot dataset was created, copying the specified block to at least the most recent snapshot dataset and then writing to the specified block, and if the specified block is indicated to be invalid in the production dataset and in all of the snapshot copies of the indication of invalid blocks that were allocated to the production dataset at the point in time when each snapshot dataset was created, writing to the specified block without copying the specified block to at least the most recent snapshot dataset.
In accordance a final aspect, the invention provides a method of operating a data storage system for providing access to a production dataset and a plurality of snapshot datasets. Each snapshot dataset is the state of the production dataset at a point in time when the snapshot dataset was created. The data storage system includes storage for storing data of the production dataset and the snapshot datasets. The method includes maintaining a meta bit map indicating invalid blocks of the storage that are allocated to the production dataset, and maintaining, for each snapshot dataset, a snapshot copy of the meta bit map indicating invalid blocks of the storage that were allocated to the production dataset at the point of time when the snapshot dataset was created. The method further includes using the snapshot copies of the meta bit map for deciding whether or not to copy blocks from the storage of the production dataset to storage of the snapshot datasets for saving the blocks to support the snapshot datasets.