The proliferation of computers and computing systems has resulted in a continually growing need for efficient and reliable storage of data. Host computing devices make use of data storage devices of many types and complexities to satisfy the growing data storage needs. The types of data storage devices commonly used range from individual flash memory devices and hard drives to storage servers and clusters of storage servers. A storage server is a specialized computer that provides storage services related to the organization and storage of data, to one or more clients. The data is typically stored on writable persistent storage media, such as non-volatile memories and disks. A storage server is configured to operate according to a client/server model of information delivery and may enable many clients or applications to access the data served by the system. A storage server can employ a storage architecture that serves the data with both random and streaming access patterns at either a file level, as in network attached storage (NAS) environments, or at the block level, as in a storage area network (SAN).
The management of data on data storage devices often includes copying data from one storage location to another. In addition, data is sometimes copied from one location in a storage device or system to another location within that same storage device or system. A traditional method for a host to perform this type of copy operations is for the host to read the data from the source storage device into the memory of the host and then transfer the data to the destination storage location, under the control of the host. This may be accomplished by the host performing a series of buffered read/write processes on smaller chunks of the data that is being copied.
While the host-centered copy process described above may work well if the host has available bandwidth and/or the quantity of data is relatively small, the burdens associated with copying data in this manner can become significant in some cases. In addition to consuming central processing unit (CPU) resources of the host, if the data is transferred over a network, network bandwidth is unnecessarily consumed because the data is first transmitted to the host and then from the host to the destination. Copy offload processes allow these types of data transfers to occur in a more efficient manner by transferring data directly between storage devices, for example one disk drive to another. In some cases, these types of copy offload processes are referred to as offloaded data transfer (ODX).
An ODX process is defined for Microsoft Windows. In the Windows ODX operation, a host transmits a request to a source storage device or system identifying data to be copied. The host receives a token representing the data of interest from the storage device. The token does not contain the data but acts as a unique identifier and/or locator for the data. The host then uses an offload write command, including the token, to request data movement from the source to a destination storage location. Windows ODX is designed to work with storage devices which implement the Small Computer System Interface (SCSI) standard. Specifically, Windows ODX features are supported in device which implement SCSI Primary Commands—4 (SPC-4) and SCSI Block Commands—3 (SBC-3). These commands are defined by the T10 committee. T10 is a technical committee of the International Committee on Information Technology Standards (INCITS) and is responsible for SCSI architectural standards and storage interfaces. Copy offload processes may also be used in other operating environments and in conjunction with other operating protocols such as Common Internet File System (CIFS).
In the processes described above, the token is used to initiate the copying of the data from one location to another. The token may also be transferred or exchanged among various hosts. When the token is provided to the destination storage device, a copy of the associated data is transferred directly to the destination device. The transfer process occurs through communication between the source and the destination devices without the data having to flow through the host and without the host managing the data transfer process.
In addition to reducing the use of host computing resources, and potentially network bandwidth, the use of a token allows the copy or data transfer process to be separated from other operations of the host. Once the host interacts with the source storage device to create the token, the actual copying or transfer of the data can occur at a later point in time. While this is beneficial in some respects, it presents additional challenges for the storage device or system. In many cases, the data associated with the token may be subject to change at any time. If the data changes in the interim, the token may no longer be valid because it no longer represents the data that existed at the time the token was created. There are several possible solutions to this problem.
One possible solution is to make a copy of the associated data in the source storage device when the token is created. If the original data changes in the interim, the copy can remain unchanged. This approach has the drawback of requiring additional storage space and resources to create and maintain a complete copy of the data associated with every token that is created.
Another possible solution is to invalidate a token whenever the associated data changes. This approach is sometimes called write tracking and requires continuous monitoring of the data for any potential write activities. Although this approach introduces only a moderate amount of additional computational overhead, it eliminates some of the usefulness of the token approach because the lifetime of a token is unknown and the token may be invalidated before it is used.
Another possible solution to the problem is to continually monitor the data associated with a token. In some cases, this approach is referred to as copy-on-write. A copy of the data is not created at the time the token is created, but all writes to the device or system are monitored to determine if a write will affect or change any data for which a token has been created. If such a change is detected, a copy of the data is then made, before the writing, such that a copy is maintained that is representative of the state of the data at the time the token was created. This approach has the drawback of requiring the additional overhead of monitoring all writes to the device or system during the life of the token(s) and may also require the use of additional storage space to make copies of the data if changes are detected.