Advances in technology have increased the capacities of DASD devices (popularly known as “disk drives” or “hard drives”) and continue to do so. Users often wish to utilize larger devices by “migrating” or moving data to them from existing smaller devices.
A DASD device may contain one or more “volumes” as further discussed below. Use of larger DASD volumes is desirable for a number of reasons, a critical one being to solve the problem of exceeding the maximum number of devices that can be attached to a processor. To take advantage of larger volumes users may need to populate them by merging existing data from smaller volumes. The term “merge” refers to combining data from multiple source volumes to a fewer number of target or destination volumes. Thus we will refer to a “merge/migrate” operation to mean moving data from at least one source volume to at least one destination volume, including merging at least some of the moved data.
Migration of data from smaller to larger storage devices can be accomplished using IBM or other vendor supplied utility programs, but because of the way most mainframe programs access data sets stored on DASD devices, the applications using the involved data sets must be quiesced for the duration of the migration process. The term “quiesced” here means that the application program cannot access the data sets being migrated; the data sets are closed. This creates a hardship for applications whose accessibility requirements do not allow for significant down time. Down-time is defined as the time between closing and re-opening of application data sets. For example, many enterprises maintain Internet access portals that require access to data storage systems. Accordingly, the data storage systems must be available 24/7 or as nearly so as is practicable; the present invention helps to address this challenge.
Mainframe files or “data sets,” for example under MVS operating systems, are allocated space on DASD devices in one or more contiguous groups of tracks. (A typical DASD device has 15 tracks per cylinder.) Each contiguous group is called an “extent.” The number of permissible extents for a data set varies by type of data set, level of the OS, and other factors, but generally some fixed limitation is imposed. A data set can span more than one volume, although an individual extent must be stored in a single volume. This presents two common problems: a) a data set cannot expand because the maximum number of extents has been reached (even though DASD space is available), and b) due to space available at the time an extent was needed, a data set may be spread over more volumes than the user would like.
Copying and/or re-allocating a data set to address these problems is complicated by several factors. Programs accessing data running under the MVS operating system expect all records of a file to be in the locations on DASD found at the time a file is “opened.” If a copy is in progress, the common input/output mechanisms used to read or write records cannot deal with an in-flight copy, i.e., accessing a record in either a source or target location based on which records have been moved at any point in time. Consequently, applications typically must be stopped while data used by the applications is being moved. Moving data takes time, exacerbated by two trends: a huge growth in the amount of data stored, and an increase in the number of hours per day of desired availability, often 24/7.
Moreover, buffering causes an integrity problem when moving data. A record may be “logically” written by a program but due to buffering techniques may not actually be written to DASD until some circumstance causes the buffer to be flushed to the DASD device. Consequently, the source in a physical copy may not represent logically what the source should represent at any given point in time. The problem is normally solved by making sure any files are closed before a copy is initiated. Closing a file forces writes to DASD of any outstanding buffers.
Finally, switching access to the target of a move (the new data location) is complicated by the fact that programs running under MVS operating systems copy into memory the physical locations of a file's extents once at “open” time. Even if the point-in-time consistency and buffering problems are solved, changing the in-memory information about where the extents of a file are located without requiring programs to close and reopen would be extremely difficult.
Because of these problems a user needing to merge data from smaller to larger volumes, or copy a data set to combine extents, using conventional and available solutions, must stop all applications accessing the data involved for the duration of the copy. The time to copy requires the application(s) be quiesced for a time frame often unacceptable for the up-time requirements of the application. The majority of time consumed when using conventional means to merge data sets from multiple source volumes to fewer target volumes comprises: 1) physical copy time, 2) target data set allocation time, and 3) time to re-catalog target data sets.
A catalog in this context is a data set that keeps track of where other data sets are located. It is somewhat similar to a directory on a PC. A data set on a given volume can be accessed without it being cataloged, but the user would have to know and specify exactly what volume it's on. The primary purpose of a catalog is that it enables locating data sets without knowledge of what volume they are stored in; a catalogued data set can be located and accessed by name.
Additionally a volume merge functionality must be accomplished with an awareness of volume content due to the MVS data management rules. DASD devices used under MVS operating systems must contain a VTOC (Volume Table Of Contents), and optionally a VTOCIX (VTOC Index) and VVDS (VSAM Volume Data Set). (VSAM stands for virtual storage access method.) Only one each of these meta-data files may exist per volume. Consequently, a volume merge functionality must include merging data from multiple source meta-data files into single target meta-data files. This requirement eliminates any solution that has no awareness of the data being copied.
For a volume merge methodology to be practical, it must also adhere to the data set extent limitations mentioned above. Although variable by type of data set, data sets all have a limit to the number of extents (contiguous groups of tracks) permissible on a single volume. A merge/migrate solution therefore must recognize that extents for multi-volume data sets may need to be combined, again requiring an awareness of volume contents.
A solution must also satisfy the requirement that all data sets belonging to one or more applications be copied at a single point-in-time, or with data “consistency.”
In the prior art, volume-oriented (or “volume level”) mirroring, or “fast replicate” mechanisms are known that copy complete volumes with little or no impact on applications. These “brute force” utilities simply copy all tracks of the volume without consideration of volume contents. Thus they do not address a volume merge/migrate scenario. Data set level fast replicate mechanisms exist that satisfy the merge/migrate meta-data requirements and remove all or most of the physical copy time from an application, but they can require a longer than desirable application down time window due to relatively long target data set allocation and re-cataloging times.
Mirroring and Fast Replicate mechanisms that rely on hardware/microcode solutions restrict the migration of data to devices of a common manufacturer. A flexible solution should allow source and target volumes to reside on standard DASD devices regardless of their manufacturer.
Hardware Volume Oriented Mirroring Methodologies.
Volume level mirroring methodologies utilizing hardware/microcode features exist. Prior art includes IBM PPRC and XRC, HDS ShadowImage, and EMC TimeFinder and SRDF. These methodologies do not satisfy the requirements for a volume merge/migrate for two reasons: they require copying tracks to the same target cylinder/track address as the source address, and they are unaware of the data sets and meta-data files contained on the volumes being copied. These mechanisms will not support a merge/migrate scenario where movement must occur across DASD devices of different manufacturers.
“Soft” Volume Oriented Mirroring Methodologies.
The term “soft” is used to describe existing mirroring mechanisms that do not rely on hardware or microcode in the solution. Equivalent functionality is achieved with software running in MVS address spaces. These include Softek/Fujitsu's TDMF and Innovation's FDR/PAS. Soft volume oriented mirroring mechanisms share the same inadequacies as hardware volume oriented mirroring as discussed above (save the ability to mirror across different manufacturers' devices).
Cylinder/Track Translate Tables for Data Set Level Copies.
A cylinder/track translate table is the heart of any copy mechanism with the capability to copy data set extents to different target cylinder/track locations, and/or if extent sizes differ between the source and target data sets. Prior art includes IBM's utility programs IEBGENER, IEBCOPY, IEHMOVE. However, these track translate tables are usually established only initially, in keeping with the point-in-time (“PIT”) of conventional data set level copies being defined at the initiation of the copy process.
Where data sets are being mirrored as opposed to complete volumes, the status of data sets can change between establishing the mirrors and splitting off the targets. The nature of status differences can be categorized two ways: a) an extent allocation change to a data set remaining in the list of data sets to be mirrored and b) data sets added to or removed from the list of data sets to be mirrored.
Extent changes to data sets included in the mirroring process:                a) Data sets that increase in size such that new source extents are acquired during the mirroring window        b) Portions of or entire extents that are released (returned as free space) during the mirroring window        
Data sets added to or removed from the list of data sets to be mirrored:                a) New data sets created after the mirrors are established but before the close window        b) Data sets removed after the mirrors are established but before the close window. This can included deleted data sets, data sets removed from volumes due to migration, and renamed data sets. (Typically, renamed data sets are not actually recognized; the old data set name is considered removed and the new data set name is treated as an added data set.) All of these potential scenarios complicate the merge/migrate requirements.        