Recent developments in storage solutions have led to the increased utilization by enterprises of Storage Area Networks (SANs) to provide storage consolidation, reliability, availability, and flexibility. Factors driving these developments include the increase in the amount of on-line data, data protection requirements including efficient and reliable data back-up, and rapidly increasing disk bit densities.
FIG. 1 illustrates a simplified example of an enterprise computing system 100. Servers 110 and 120 are at the heart of computing system 100. As members of enterprise computing system 100, servers 110 and 1.20 are often referred to as “hosts” or “nodes,” and can execute an number of different types of programs including, for example, operating systems, file systems, volume managers, and applications programs such as database systems. FIG. 6 (described below) illustrates some of the features common to servers 110 and 120 as well as client computer systems 130. Servers 110 and 120 can exchange data over network 140, typically a local area network (LAN), e.g., an enterprise-wide intranet, or a wide area network (WAN) such as the Internet. Additionally, network 140 provides a communication path for various client computer systems 130 to communicate with servers 110 and 120.
Other elements of enterprise computer system 100 include storage area network (SAN) 150, SAN switch 160, and storage devices such as tape drive 170, storage array 180, and optical drive 190. As shown in FIG. 1, both servers 110 and 120 are coupled to SAN 150. SAN 150 is conventionally a high-speed network that allows the establishment of direct connections between storage devices 170, 180, and 190 and servers 110 and 120. Thus, SAN 150 is shared between the servers and allows for the sharing of storage devices between the servers to providing greater availability and reliability of storage.
SAN switch 160, tape drive 170, storage array 180, and optical drive 190 are examples of shared resources. The most common shared resource in an enterprise computing environment is some form of shared data resource, such as one or more disk drives. Although a disk device (and various related devices such as storage array 180) is perhaps the most common example of both a shared resource and a shared data resource, a variety of other types of devices will be well known to those having ordinary skill in the art. Moreover, servers 110 and 120 can be connected to SAN 150 through SAN switch 160. Additionally, the shared resources can be directly connected to, or part of, the servers, and thus enterprise computing system 100 need not include a SAN. Alternatively, servers 110 and 120 can be connected to multiple SANs. Additionally, SAN switch 160 can be replaced with a SAN router or a SAN hub.
Protecting the integrity of data as it is moved from one part of a computing system to another is an important aspect of any computer system. Data movement can result from a variety of operations including normal application software operation, data backup operations, data restore operations, and data relocation resulting from system design changes or hardware failures. In many computing systems, data movement is handled by programs executing on servers such as servers 110 and 120. In the case of data movement operations such as data backup and data restore, the use of server resources to handle the data movement means that fewer server resources are available for more typical operations such as application software and operating system overhead. Accordingly, efforts have been taken to move some I/O processing off of system servers to an offhost agent. Such agents are often referred to as third-party copy (3PC) devices or data movers.
Third-party copy operations transfer data directly between storage devices in a SAN or other environment using a third-party copy device, copy manager, or data mover 200 such as illustrated in FIG. 2. Data mover 200 can be a separate device as shown; part of a SAN switch, router, bridge, or another SAN network component (not shown) or within a storage element such as storage array 180 in FIG. 1. As is typical of SAN environments, the connection between the servers 110 and 120 and data mover 200 is conventionally a channel protocol bus such as SCSI or fibre channel connected directly to the storage devices or storage device controllers (e.g. RAID controllers). Thus, the data mover operates on behalf of some other piece of software, e.g., a backup or restore application, to accomplish the third party copy operation.
In one example of a third party copy device, the device implements the SCSI-3 extended copy command. SCSI-3 commands are described in SCSI Primary Commands-3 (SPC-3), Working Draft, Revision 03, T10, a Technical Committee of the Accredited Standards Committee of the National Committee for Information Technology Standards (NCITS), 10 Jan. 2002, which is hereby incorporated by reference herein in its entirety. The extended copy command provides a SCSI command to copy data from one set of devices to another. These devices can be disks, tapes, or other types of storage devices. This SCSI protocol command can be used on devices connected via SCSI cables or Fibre Channel connections. The data mover is the device that receives and performs the extended copy command. Another device is an intelligent device somewhere in the storage infrastructure that understands the extended copy command. This can be another server, but more likely will be a smart-storage device, such as an intelligent tape device, disk device, SAN switch or storage router. The host server typically has some extra processing to perform at first, in order to gather all the file or volume information necessary to pass along inside the extended copy command. Additionally, if either the source or destination of the extended copy is a removable media device, then the host will typically first issue other SCSI commands to get the removable device into the proper position (loading or positioning the tape). Next, the host issues the extended copy command to the data mover, telling the device to move data from one storage device directly to another storage device. After issuing the extended copy command, no further instructions have to be issued by the host to move the data—the devices themselves perform the entire data movement operation over the SCSI bus or Fibre Channel connection.
As illustrated in FIG. 2, storage devices 210 and 220 are coupled to the SAN 150. In this example, storage devices 210 and 220 are shown as a data source and a data destination respectively (e.g., illustrating a restore operation from a tape drive to a hard disk), but such devices can typically operate as either data sources or data destinations. Alternately, source storage devices can be directly coupled to the SAN 150 through data mover 200. In still another example, data mover 200 can be included as part of a proprietary storage device, such as a storage array. Thus, data movers 200 can be implemented as independent devices, devices in traditional SAN components, or even as software executing on a SAN component, e.g., software executing on a storage device controller.
In general, data to and from storage devices is provided using either block-level or file-level access. File level access requires some knowledge of the underlying file system and/or volume management system used to organize data on the storage devices. This type of information is typically available only at the host level, and thus I/O operations utilizing file-level access must be performed or at least managed by software executing on a host computer. Block-level access uses physical storage device addresses to access data and thus need not be “assisted” by some entity having file system and/or volume knowledge. Third-party copy operations typically utilize block-level access because of the inherent speed and efficiency gained by avoiding heavy use of host resources.
Returning to the example illustrated in FIG. 2, third-party copy data transfers are initiated when an application operating on one of the servers provides the data mover 200 with the addresses of the source and destination devices and blocks. For example, a data restore application executing on server 110 can request that certain data on a tape in data source 210 be restored to a disk drive in data destination 220. Such address information is typically in the form of an extent list having one or more extents. An extent is typically a contiguous set of storage blocks allocated for a file portion, a file, or multiple files. Extents are typically represented by a device address indication, a starting block address on that device, and a length (number of contiguous blocks). However, extents can be defined in a variety of different ways, e.g., a starting address and an ending address, no device information explicitly included, etc. Thus, an extent is generally any information used to locate a desired portion of a storage resource.
For the purposes of this example, data destination 220 is a block (disk) device on which a file system or database resides and data source 210 can be any block or stream device (a serial device such as a tape drive). Once initiated, a third-party copy operation generally operates separately from any file system, volume management, or application program activity on the system servers. Thus, since the server can reorganize or write to data residing on data destination 220 asynchronously of the third-party copy operation, there is considerable risk in moving data into a live file system or database on the data destination. Potential error conditions can arise due to a reorganization and/or modification of the data destination device after an extent list initiated by a third-party copy request has been generated and sent to the data mover 200.
The potential error conditions can be referred to as “sector slipping” events and manifest themselves as two error states on the data destination device. A first sector slipping error state involves a movement of data or allocated space from the destination extents to another physical location (e.g. volume reorganization). As illustrated in FIG. 3A, disk 1300 is organized as volume A and includes destination blocks 310 corresponding to destination extents that are to be written to by a third-party copy operation. Some time after the list of data extents has been provided to the data mover, but before the third-party copy operation has completed, an error is detected on disk 1300 causing a volume manager to move all data for volume A from disk 1300 to disk 2320. Since the third-party copy operation has not yet completed and the destination blocks 310 have moved, there exists the possibility that the destination blocks 310 moved from disk 1 to disk 2 will not reflect all the data intended to be copied by the third-party copy operation. Furthermore, the data mover has no way of knowing that the reorganization is taking place and continues to move blocks into the destination blocks 310 on disk 1.
Another error state is illustrated in FIG. 3B. Disk 1350 is partitioned into two volumes, volume A and volume B. Volume A includes destination blocks 360 corresponding to destination extents that are to be written to by a third-party copy operation. Volume B includes application data 370 that is, in general, unrelated to the data associated with destination blocks 360. Some time after the list of data extents has been provided to the data mover, but before the third-party copy operation has completed, the storage space on disk 1 is reallocated so that volume A is moved to disk 2380 and volume B is reorganized on disk 1. In this example, the reorganization of volume B included movement of application data 370 to an area of disk 1 that includes destination blocks 360. Thus, as the data mover writes to destination blocks 360, it may be erroneously writing over valid application data.
Accordingly, it is desirable to provide safe and accurate data movement in third-party copy operations.