Storage devices are employed to store data that are accessed by computer systems. Examples of basic storage devices include volatile and non-volatile memory, floppy drives, hard disk drives, tape drives, optical drives, etc. A storage device may be locally attached to an input/output (I/O) channel of a computer. For example, a hard disk drive may be connected to a computer's disk controller. A storage device may also be accessible over a network. Examples of such a storage device include network attached storage (NAS) and storage area network (SAN) devices. A storage device may be a single stand-alone component or be comprised of a system of storage devices such as in the case of Redundant Array Of Inexpensive Disks (RAID) groups.
For mission-critical applications requiring high availability of stored data, various methods for enhancing data reliability are typically employed. One such method is to provide a “mirror” for each storage device. In a mirror arrangement, data are written to at least two storage devices. Thus, data may be read from either of the two storage devices so long as the two devices are operational and contain the same data.
In general, copying data from a first location (e.g., including one or more data volumes) to a second may be done for a myriad of reasons, including replication and backup/versioning. In a replication operation, a data set may be copied from the first location to the second to ensure that the second is a mirror of the first and that each stores a copy of the data set such that if there is a failure that results in the data set being in accessible from the first location, the second is available for access. In a backup/versioning operation, a “copy on write” method can be employed such that changes to the data set made after the point in time result in copies of the original data that was stored in the first data volume at the point in time being copied to a save volume—a data volume acting as, for example, a backup location—before being overwritten in the first volume. In this way, the data set can be “rolled back” to the point in time.
One illustrative method for forming a point in time copy of a data set is referred to as a snapshot and is described in detail in U.S. Pat. No. 6,792,518 to Armangau et al., which is incorporated herein by reference in its entirety.
A snapshot does not replicate a full copy of the data set (referred to as a production data set). Rather, the snapshot only stores differences between a current version of the production data set and the version of the data set at the point in time when the snapshot was taken. In the implementation described in the '518 patent, the snapshot maintains several data structures, including a block map. When a snapshot is created at time T=0, these data structures, including the block map, may be empty, and they are populated when the data set is written to after the creation of the snapshot. For example, when contents of a first data block in the production data set are about to be changed as a result of a data write operation conducted after time T=0 (e.g., time T=0.5), the original contents of the data block are copied to a save volume such that a copy of a state of the data block at the time the snapshot was created (i.e., the contents of the data block at time T=0) is maintained. An entry is then placed into the block map linking the data block in the save volume to its corresponding position in the point in time data set that the snapshot represents. This can be repeated over time, for each change made to the production data set after the snapshot was created, such that the block map contains an entry for each changed data block.
The block map of the snapshot can then be used at a later time (e.g., time T=10) to determine the state of production first data set at the time the snapshot was created (time T=0) even if it has changed since T=0. To do so, a read operation to the snapshot for a selected data block will access the block map to determine if the block map contains an entry for that selected data block. If so, it can be determined that the selected data block changed after the snapshot was created and that the data stored in the production data set is not the data that was stored in the selected data block at time T=0. The information stored in the entry in block map will then be accessed to determine the location of the corresponding data and will read the data from the save volume that is the data that was stored in the selected data block in the first data volume at time T=0. If, however, there is no entry in the block map for the selected data block, then it can be determined that the data did not change after the creation of the snapshot, and that the data stored in the production data set is the data that was stored at time T=0. Accordingly, the data can be read from the production data set.
Multiple snapshots can also be created at different times, and can work together in a serial fashion so that only the most recently created snapshot directly tracks changes to the production data set. For example, if a data block was overwritten after time T=0 but also after time T=1, when a second snapshot was created, the snapshot at time T=0 may not reflect that the selected data block was changed, but the snapshot created at time T=1 will. The snapshot created at time T=1 may have its own block map containing addresses of data blocks on the save volume storing the contents of data blocks overwritten after time T=1. In response to a read operation, carried out at time subsequent to T=1 to the snapshot at time T=0, it may be determined from the snapshot at time T=1 that the selected data block in the production volume was overwritten subsequent to T=0, so that the data block that existed at T=0 can be retrieved (using the block map for snapshot T=1), from the save volume.
As should be appreciated from the foregoing, snapshots can be used to determine previous states of a data set at past times without needing to make a full copy of the data set at those past times. Instead, only the “deltas” or differences are stored in snapshots.
In general, a data replication system can provide a duplicate copy or replica of changing data on a storage device. Write commands issued to a primary storage device are duplicated and issued to the data replication system, which records the written data in its own storage medium. Sophisticated data replication systems store not only a current duplicate copy of the primary device but also allow additional past-time images of the primary device to be accessed. This may be done through “journaling,” where the write commands themselves are archived, rather than simply a copy of the data.
Sometimes, however, communication to the data replication system is lost. This may be for a variety of reasons. For example, a physical connection with the device hosting the data replication system may be broken, or communication software may malfunction. When this happens, a data replication system will be out of synchronization with the primary storage device. Some reconciliation process is necessary to restore synchronization between the data replication system and the primary storage device.
Data storage operations for writing the copy in the mirror can be handled in either a synchronous manner or an asynchronous manner. In conventional synchronous mirroring, the primary storage device ensures that the host data has been successfully written to all secondary storage devices in the mirror and the primary storage device before sending an acknowledgment to the host, which can result in relatively high latency, but ensures that the primary storage device and all secondary storage devices are updated before informing the host that the write operation is complete. In asynchronous mirroring, the primary storage device sends an acknowledgment message to the host before ensuring that the host data has been successfully written to all secondary storage devices in the mirror, which results in relatively low latency, but does not ensure that all secondary storage units are updated before informing the host that the write operation is complete.
A consistency group is a collection of related volumes that need to be kept in a consistent state. A consistency transaction set is a collection of updates to the primary volumes such that dependent writes are secured in a consistent manner. Consistency groups maintain data consistency across volumes.
Conventionally, storage systems provide logical volumes (LUs or LUNs) to host computers by statically assigning storage areas of disk drives. A storage capacity of the disk drives assigned to the LUs may be larger than a storage capacity to be actually used by the host computer, which is called over provisioning. This is because the storage capacity to be used by the host computer cannot be grasped with accuracy. Also, operation costs required for changing LU capacities are high.
As a way for addressing overprovisioning, there is a known concept called thin provisioning. The storage systems provide volumes realized by the thin provisioning (thin provisioning volume: TPLU) to the host computers. Thus, the host computers recognize the TPLUs provided by the storage systems as volumes having a storage capacity larger than that of the disk drives actually assigned to the respective TPLUs.
Upon reception of a write request to a TPLU from the host computer, the storage system dynamically assigns an unwritten storage area of a storage pool to the TPLU requested for data write.