1. Field of the Invention
The invention relates generally to managing snapshots in a storage system and more specifically relates to managing metadata associated with snapshots for logical volumes to improve performance of storage systems.
2. Discussion of Related Art
Storage systems typically include one or more storage controllers, each coupled with one or more storage devices operating to persistently store data. The storage controllers are generally responsible for receiving and processing Input/Output (IO) requests from one or more attached host systems requesting the reading or writing of information to the storage system. Additionally, the storage controllers may process additional IO requests, such as requests for managing internal data associated with storage devices, calculating redundancy information for data on storage devices, performing rebuild operations on redundant volumes on the storage devices, and the like. Typical storage devices may include any number of apparatus for persistent data storage, such as hard disks, flash based drives, optical based storage devices, non-volatile memory systems, and other such apparatus operable to persistently store data.
Typically, a host system attached to the storage system accesses data persistently stored on the storage devices using one or more logical volumes mapped to the storage devices by the storage controllers. Mapping logical volumes to storage devices allows a host to access the logical volume irrespective of an actual physical mapping to any specific storage devices. For example, a host system may access a logical volume as a linear sequence of block addresses. When a storage controller receives IO requests from the host system, the IO requests are mapped to physical storage in a number of configurations, such as stripes of data on specific storage devices. This type of IO mapping by the storage controller allows a wide variety of redundancy and/or performance enhancing configurations for the logical volume which are often performed transparently to the host system.
As a host reads and writes data to a logical volume, various options exist in allowing a host system to capture the state of the data within the logical volume at various points in time. Because a logical volume does not specifically map to physical storage in the storage system, the storage system is free to perform various activities on data read from, or written to, a logical volume to allow the host to access earlier versions of data at some future point in time. For example, a host system may wish to capture a previous state of the logical volume before installing patches to an operating system running on the logical volume. In essence, previous states of data on the logical volume may be captured and made available at some later time.
One such brute force method of capturing the previous state of the logical volume would be for the storage system to allocate enough storage space on the storage devices to perform a complete copy of the logical volume onto the newly allocated space. This is typically inefficient because only a small portion of data on the logical volume may change from one point of capture to another. Additionally, as typical logical volumes may exceed many hundreds of gigabytes in size, all available physical storage would be quickly allocated for such activities.
A more elegant solution currently in practice is to create a sequence of snapshots of the logical volume. Typically, a snapshot includes a temporary volume of storage allocated by the storage system on the storage devices, and additionally includes an IO mapping table, or a relocation table, which indicates how data is stored on various portions of the temporary volume. When a snapshot is created by the storage system, subsequent changes to the logical volume are stored on the various portions of the temporary volume. Thus, as each snapshot is created, any additional changes to the logical volume are recorded within the relocation tables and temporary volumes of each individual snapshot, thus preserving the state of earlier snapshots and even the original root volume of the logical volume (i.e., a root volume is the base logical volume before any snapshots are applied to it).
The use of relocation tables in snapshots is useful in allowing the storage of only the changed data in a logical volume until a new snapshot is created. For example, if a snapshot is created and subsequently a host issues a write IO request for address block twenty of the logical volume, the storage system may store the data on the current temporary volume of the current snapshot. In this case, the storage system would update the relocation table associated with the current snapshot to indicate that address block twenty is currently being read from or written to a specific portion of the temporary volume of the current snapshot. In some cases a snapshot may be “thinly provisioned.” In this case, a snapshot may be initially allocated some small portion of space (e.g., 200 megabytes) of physical storage. After the initial allocation of space, storage may be expanded as necessary to include any new data written to the snapshot.
While snapshots may be used to capture changes to the logical volume, a view of a snapshot (and the corresponding root volume) is used to allow the creation of a new logical volume for read and write access of the data at the point when the view is created. For example, if a number of temporally sequential snapshots are created from a root volume, a user may wish to create a new logical volume associated with a particular snapshot and make subsequent changes without altering the original logical volume. Like branches on a tree, views allow a user to “branch off” the main sequence of snapshots for subsequent modification. Each view includes the underlying data, such as the sequence of snapshots and the original root volume. A view, however, is different than a typical snapshot as it allows the user to create a virtual volume for presentation of the underlying data for read and write access. Along the “branch”, the view is independent from the original sequence of snapshots.
Problems may arise when large numbers of snapshots are created in a logical volume. As numerous snapshots are created, the storage system may have to search various relocation tables for the specific data requested. For example, address block twenty of the logical volume may reside on snapshot fifty, while the current snapshot of the logical volume may be snapshot two hundred. In the case where address block twenty did not change from snapshot fifty one to snapshot two hundred, an IO read request for address block twenty of the logical volume would actually be read from the temporary volume associated with snapshot fifty. Thus, when the storage system receives an IO read request for address block twenty of the logical volume, it may have to read and process each relocation table in series from snapshot two hundred to snapshot fifty to locate the requested data. As relocation tables are often many megabytes in size, it would be impractical to store many thousands of snapshot relocation tables in a memory of the storage controller for fast processing. Instead, the relocation tables are typically stored along with the temporary volumes on physical storage. Thus, the reading of many hundreds or thousands of relocation tables from physical storage may impose a significant performance penalty to the storage system when performing IO requests for a logical volume which includes snapshots.
Thus it is an ongoing challenge to manage snapshot data in a logical volume to improve the performance of a storage system.