I. Technical Field
The present invention generally relates to the field of moving data.
II. Background Information
Many computer systems include one or more host computers, and one or more storage systems that store data used by the host computers. An example of such a computer system including a host computer 1 and storage systems 3, 4 is shown in FIG. 1.
The storage systems 3, 4 include a plurality of disk drives (5a, 5b or 6a, 6b) and a plurality of disk controllers (7a, 7b or 8a, 8b) that respectively control access to the disk drives. A plurality of storage bus directors (9, 10) control communication with host computer 1 over communication buses (17, 18). Each storage system 3, 4 further includes a cache 11, 12 to provide improved storage system performance. In particular, when the host computer 1 executes a read from one of the storage systems 3, 4, the storage system may service the read from its cache 11, 12 (when the data is stored in the cache) rather than from one of the disk drives 5a, 5b or 6a, 6b to execute the read more efficiently. Similarly, when the host computer 1 executes a write to one of the storage systems 3, 4, corresponding storage bus directors 9, 10 can execute the write to the cache 11, 12. Thereafter, the data can be de-staged asynchronously in a manner transparent to the host computer 1 to the appropriate one of the disk drives 5a, 5b, 6a, 6b. Finally, storage systems 3, 4 include internal buses 13, 14 over which storage bus directors 9, 10, disk controllers 7a, 7b, 8a, 8b and caches 11, 12 communicate.
The host computer 1 includes a processor 16 and one or more host bus adapters 15 that each controls communication between the processor 16 and one of the storage systems 3, 4 via a corresponding one of the communication buses 17, 18. It should be appreciated that rather than a single processor 16, host computer 1 can include multiple processors. Each bus 17, 18 can be any of a number of different types of communication links, with the host bus adapter 15 and storage bus directors 9, 10 being adapted to communicate using an appropriate protocol via the communication buses 17, 18 coupled there between. For example, each of the communication buses 17, 18 can be implemented as a SCSI bus with the storage bus directors 9, 10 and adapters 15 each being a SCSI driver. Alternatively, communication between the host computer 1 and the storage systems 3, 4 can be performed over a Fibre Channel fabric.
Typically, the storage systems 3, 4 make storage resources available to the host computer for assignment to entities therein, such as a file system, a database manager, or a logical volume manager. If the storage systems are so-called “dumb” storage systems, the storage resources that are made available to the host computer will correspond in a one-to-one relationship to physical storage devices within the storage systems. However, when the storage systems are intelligent storage systems, they will present logical units of storage to the host computer 1 that need not necessarily correspond in a one-to-one relationship to any physical storage devices within the storage system. Instead, the intelligent storage systems may map each logical unit of storage presented to the host across one or more physical storage devices.
Administrators of computer systems like that depicted in FIG. 1 may want to migrate sets of logically related data, such as a database or file system, from one storage resource to another. One common reason is that a data set might grow at such a rate that it will soon exceed the capacity of a storage system. Other common reasons include the administrator's desire to move the data set to a storage system with faster response time, to lay the data set out differently on the resource to facilitate faster access, to reconfigure disk striping for fault tolerance and/or performance purposes, or to optimize the geographic location where the data set is physically stored.
Data migrations are often complicated and problematic exercises. Administrators usually must take offline any applications executing on the host that use the source storage device. Depending on the size of the data set, applications can be offline for lengthy periods, leading to a loss of productivity, and opportunity costs associated with not having the data set available for important business functions. Migrations typically are manual labor-intensive efforts, and are therefore error-prone, costly, and labor-intensive.
Conventional data migration efforts typically involve the following four separate steps, requiring manual intervention between each: source discovery, target provisioning, data synchronization (i.e., movement), and reconfiguration to switch to target access.
The source discovery step identifies the physical locations (e.g., the storage system and logical unit) at which the data set is stored. This step is typically performed manually. An exemplary conventional method includes the use of spreadsheets to compile an inventory of file systems, database table spaces, and other data stored on individual storage volumes.
The target provisioning step identifies and configures the storage resources (typically logical units of storage presented by another storage system) to which the data set will be moved. Conventionally, this step requires extensive manual intervention by, for example, database administrators and system administrators. This step may include making new logical units visible to one or more host computers, mapping file systems and/or table spaces on target logical units, configuring switches, configuring volumes for redundancy, and planning for more efficient physical data access. This step is typically very time-consuming and labor-intensive, and thus expensive.
The synchronization step involves moving or copying the data set from the source locations to the target locations. Various techniques have been used to perform this step, including employing a utility application running on one or more host computers to read the data set from the source locations and write the data set to the target locations. Alternatively, a mirroring facility, such as the SYMMETRIX Remote Data Facility (SRDF) available from EMC Corporation, Hopkinton, Mass., may be used to create mirrors between source and target volumes on different storage systems and to then synchronize them so that the storage systems themselves perform the copy. Other data copy tools available from EMC include OPEN REPLICATOR for SYMMETRIX data storage systems and SANCOPY for CLARIION data storage systems. Synchronization is often the most time consuming of the four steps, and usually requires taking the applications that are accessing the data be taken offline (i.e., refused access to the data) while the step is performed.
After the data set has been moved or copied, the switch to target step typically involves reconfiguring the computer system so that applications using the data set recognize the target locations as the new storage location for the data set. Again, this conventionally requires human intervention and may require rebooting of the host computer(s) that access the data set.
EMC Corp. has recognized the desirability of being able to migrate data non-disruptively. U.S. Pat. No. 7,093,088, which is hereby incorporated by reference, for example, discloses some methods and systems that enable less disruptive migration. The inventors of the present invention, however, believed that alternative methods and systems would be more effective and more flexible in enabling less disruptive migration and virtualization of data storage systems.