Users of data storage systems, such as storage area networks, or SANs, often desire to perform a storage operation, such as taking a “snapshot”, or point-in-time copy, of the collection of data. For example, the operator of a database that tracks credit card transactions may want to take a snapshot of the database every night at midnight. The snapshot, or backup copy, may be stored as recovery data in case of system failure, for example, or it may be used as data with which to test a new version of a program and thus avoid potential corruption of production data. The location of the original data is referred to as the “copy source” or “source logical unit (LU)”, and the location of the backup data is herein referred to as the “copy destination” or “destination LU”, where a logical unit is a logical partition of a physical data storage device. An LU may be located on one or more physically separate storage entities, such as hard disk drives, redundant array of inexpensive disk (RAID) arrays, and the like. The collection of data to be copied is herein referred to as the “dataset”. Portions of the dataset are herein referred to as “chunks”.
As used herein, the term “write request” refers to the signal or message that triggers one or more write operations. The write request may be stored, in a queue for example, for later processing. Processing a write request generally means performing or attempting to perform the write operation; in some circumstances the write request may be processed multiple times (e.g., resubmitted to a queue), and/or additional write requests may be issued as a result. For example, a request to write a large data object to a storage entity may require multiple processing passes, where each pass writes only a portion of the data object to the storage entity; alternatively, the single request may be broken up into multiple smaller write requests. Since each write request represents a write operation, however, the terms “write” and “write request” are functionally synonymous, and thus are used interchangeably herein.
As used herein, the term “session” refers to the association of a copy source to a copy destination. Session information may be stored in a data structure that identifies the copy source, copy data, and other information, such as whether the session is active or inactive. A source LU may have multiple sessions associated with it, and multiple sessions may refer to the same source LU, destination LU, or both.
As used herein, the term “consistent operation” refers to an operation that partitions an I/O stream across multiple source LUs such that host I/O dependencies are maintained. That is, if I/O Requestn is dependent on I/O requestn-1, then I/O Request, will only be captured by the operation if I/O Requestn-1 was also captured by the operation.
For example, to be a consistent snapshot, the snapshot needs to happen across all the LUs in the dataset before any write to any of the LUs in the dataset is acknowledged. One way to guarantee a consistent operation is to take a snapshot only while the dataset is in a quiescent state. In actual practice, taking a snapshot of a dataset only while the dataset is in a quiescent state means that before a snapshot or other consistent operation is performed, new writes to the dataset are temporarily suspended, and writes that are currently pending (also referred to as “outstanding writes”) are processed. Once all pending writes have been processed, the snapshot or other copy operation is performed, after which new writes to the dataset are again allowed. Suspending new writes to the dataset or storage entity is herein referred to as “arresting” the writes, and allowing new writes to resume is herein referred to as “releasing” the writes. Processing pending writes until all pending writes have been performed is herein referred to as “draining” the writes.
This technique of “arrest, drain, perform a storage operation, and release” has disadvantages, however: during the write drain, the incoming write requests must be queued; once the write drain is complete, the storage operation has been performed, and the writes have been released, the system must then process not only the write requests that continue to come in, but also the backlog of writes that were queued from the time that the writes were arrested until the time that the writes were released. On a busy production system, a large backlog can cause huge spikes in write response time, and can greatly increase the time required to perform I/O operations. This problem may also occur during any operation which similarly requires that the dataset or the storage entity on which the dataset resides first be in a quiescent state.
Accordingly, in light of these disadvantages associated with conventional drain operations, there exists a need for a more efficient drain operation. Specifically, there exists a need for methods, systems, and computer program products for performing an input/output (I/O) operation that includes a virtual drain.