Computer systems are capable of storing large amounts of data in main memory, which typically comprises direct access storage devices (DASDs). Data is physically stored in designated data fields within the storage device. For example, information stored in a DASD is located on magnetic disks having concentric tracks, such tracks being further divided into data fields. Each disk has at least one dedicated head per face for writing data to and reading data from the disk. Data fields are identified by storage device number, head, track number and physical address. Data storage and retrieval is controlled by the operating system program residing in the central processing unit (CPU).
System users access information in main memory through a user interface, which is part of the operating system program. The user interface categorizes data by named datasets or files. To retrieve a particular item of data, the system user specifies the dataset or file containing the desired data. The operating system then "looks up" the dataset name in a table of contents for main memory. The table contains a list of dataset names and the physical location of stored data in main memory associated with those dataset names. The data is retrieved from main memory and placed in virtual memory space. Conceptually, virtual memory is a workspace in the CPU's volatile storage, portions of which may be temporarily assigned to system users as they run applications.
As data is transferred to virtual memory, the physical addresses are mapped into logical memory addresses. The logical addresses are used by applications programs to access and retrieve the data from virtual storage. Once the user applications are completed, the updated data is transferred back to main memory, and the logical addresses will be mapped back into actual physical addresses.
Retrieval of data from non-volatile storage devices such as DASDs consumes considerable time, since the CPU must initiate a request for data and wait for the device to locate and retrieve the information. In contrast, volatile memory such as RAM can be accessed quickly. Such volatile memory in the CPU is used to provide virtual memory to system users. Thus, applications running from virtual memory are more efficient than those accessing main memory directly. In addition, data in virtual memory is easily manipulated.
Operating systems implementing system managed storage require means for copying data from one main memory storage device to another. Typically, either a physical copying or logical copying utility is employed.
Physical Copying. A physical copy utility acquires the physical address of source data on a source storage device, and the physical address of the desired target location on a target device. The source and target storage devices may, for example, be direct access storage devices (DASDs). The utility reads tracks from the source DASD into virtual storage and then writes them at the desired location on the target DASD.
A physical copy utility offers the advantage of speed since it essentially runs at the device speed. But this method has a number of limitations. One such limitation is the necessity that the source and target datasets must be locked for the duration of the copy. In other words, members of the dataset being copied will not be accessible to other users until copying is complete. Another limitation of this approach is the need for an empty target dataset, usually on the same device type as the source (e.g. two IBM 3390 DASDs). Space must therefore be available on the target DASD before copying can occur. Moreover, the method is not suitable for copying between different device types (e.g. from an IBM 3380 to an IBM 3390). Yet another limitation of physical copying is the requirement that data is written to the target location precisely as it is read, so that no compression (ie. removal of unnecessary spaces or "gas" between relevant data) will occur. For these reasons, physical copying is used for relatively limited purposes such as dump/restore operations.
Logical copies. A logical copy utility uses the same interfaces as an application program to read data from a source and to write that data to a target location. One example of an implemented logical copy utility is the IEBCOPY with PDSE (Partitioned Data-Set Extended) to PDSE copy capability, released in the International Business Machines Data Facility Program version 3.2.0. IEBCOPY runs in a multiple virtual storage (MVS) operating system environment.
The IEBCOPY utility performs the following steps for each copy operation:
1) Open a source dataset PA1 2) Create a target dataset PA1 3) Read the records of the source dataset into virtual storage PA1 4) Write the records from virtual storage to the target dataset PA1 5) Name the target dataset PA1 6) Close the source dataset PA1 7) Close the target dataset PA1 1) Open a source dataset and allocate space in the target device for a target dataset PA1 2) For each member, PA1 3) Once all members have been copied, close the source dataset and the target dataset.
The MVS operating system organizes data into datasets of different types. Two specific examples of dataset types are the sequential and partitioned datasets. As previously discussed, a sequential dataset is associated with certain data which is accessed by referencing the dataset name. A partitioned dataset is similarly associated with certain data, but this data is further organized into members, each having a unique member name. Thus data must be accessed by referencing both the dataset name and the member name.
Copying of sequential datasets via a logical copy utility is relatively straightforward. The source dataset is opened and all data associated with that dataset is read at once, and written to a target dataset in the same manner. Copying of partitioned datasets is more complicated. Namely, the sequence of steps outlined above for copying a dataset must be performed on each member of a partitioned dataset.
Therefore, the sequence of steps involved in copying a partitioned dataset using the IEBCOPY utility for PDSE to PDSE copies is as follows:
a) Open a source member PA2 b) Create a target member PA2 c) Read the records of the source member into virtual storage PA2 d) Write the records from virtual storage to the target member PA2 e) Name the target member PA2 f) Close the source member PA2 g) Close the target member
Logical copying of partitioned datasets is desirable because it provides the flexibility of an application program. For instance, users are not locked out of an entire dataset during a copy operation but may access any member of the source not currently being copied. In addition, data may be copied from a source device of one type to a target device of another type (e.g. IBM 3380 to IBM 3390). Another advantage of logical copying is the fact that "gas" existing in the source dataset will not be copied to the target. Moreover, members physically fragmented in the source dataset may be allocated to contiguous space in the target, provided that enough contiguous space exists. As a further advantage, the target dataset may be an existing dataset with existing members. These members remain accessible to users while new members are being copied in.
Logical copy utilities can be highly efficient for copying large sequential datasets. However, for partitioned data the method is inefficient and slow. Often, partitioned datasets have thousands of members, each member having only a small amount of associated data. In such a case, there is substantial processing overhead for a relatively small amount of total data being transferred. For example, the IEBCOPY facility requires two directory writes per member, and a minimum of two data I/Os to read all of the data from the input member on the source device and write it to the target. Thus for a data set with 1000 members, at least 4000 I/Os must be performed.
What is needed is a software copy facility that has the flexibility of logical copying, yet offers high speed performance in copying partitioned datasets.