1. Field of the Invention
This invention generally relates to data storage and more particularly to a method and apparatus for improving a copy process for transferring data from one storage device to one or more other storage devices.
2. Description of Related Art
Copying data from a first set of storage locations, commonly, called a “source” or “source device”, to a second set of storage locations, called a “destination” or “destination device”, is well known in the art. In some situations copying provides data redundancy. In other situations providing two or more copies enables a like number of independent applications or procedures to process the copied data independently.
In one conventional approach, as particularly well known in the use of personal computers and early mainframe systems, a host application “copy” command initiates a data copying procedure whereby the data is transferred on a file-by-file basis between different logical devices or between different directories or folders on the same logical device. Such copy commands specify a path to a source file to be copied and a path to an initial destination storage location.
In personal computers and early mainframe systems processors copied the data from the source device to the destination device as a primary application. During this copy process no other application could access the data in the file at either of the source or destination devices. In some situations, the copy process could consume so much processing time that all other applications were effectively prevented from operating. In other situations it was necessary to assure that no other application interacted with the source or destruction during the copy phase to avoid corrupting the data.
Newer data processing systems include larger and more sophisticated data storage facilities. In personal computer and early mainframe systems, a systems administrator would configure physical disk storage into a limited number of logical devices. For example, a personal computer with two physical disk storage devices might be formatted into “C” and “D” logical devices. The second physical device might also be formatted into “D” and “E” logical devices.
Newer storage facilities allowed an administrator to configure the physical disk storage into hundreds and even thousands of logical devices with each logical device being assigned on some basis to store related data or application programs. In addition these systems are designed to interact with multiple applications that operate on multiple host processors. In these facilities a copy application performs copying independently of other host applications. That is, while data is being copied from a source device to a destination device, other applications can operate on data in other than the source and destination devices. Still in these systems access by an application other than the copy application to the source or destination device was often blocked until such time the copy application was completed.
Concurrently with this increase in storage facility sophistication, data processing systems were evolving into two arbitrary classes, namely: “mainframe systems” and “open systems”. Generally speaking “mainframe systems” refer to larger IBM and IBM-like data processing systems with a powerful set of internal input-output commands that use CKD (Count-Key-Data) data formatting. “Open systems” refer to other data processing systems that operate with different internal input-output commands and different internal protocols and an FBA (Fixed Block Architecture).
Initially large capacity data storage facilities were configured for operation with mainframe systems because at the time they were the only systems that had the power and storage capacity to handle large data sets. These systems also provided a data track level of control and had several ways to copy data from one location to another. Over time steps were taken to enhance the procedures for copying data from one logical device to another transparently to other applications.
For example, U.S. Pat. No. 6,101,497 to Ofek, and assigned to the same assignee of this invention, discloses a process for obtaining a single copy of data by logical volume or device essentially independently of normal processing. This process identifies a first, or production, logical volume as a source and a specifically configured logical volume as a logical volume or destination. The user initials an “establish” process by which the target logical volume is brought into synchronism with the production logical volume. During this process the target logical volume is not available to any other process. When a snapshot of the logical volume is taken, a “split” occurs whereupon the data in the target logical volume is available as a copy for other applications. This data does not reflect any changes to the production volume.
This process is limited to the transfer of entire logical volumes. However, many circumstances arise when there is a need to transfer a data set or file existing in only a portion of a logical volume. U.S. Pat. No. 6,363,385 (2002) to Kedem et al., and assigned to the same assignee of this invention, discloses a method and apparatus for making independent data copies in a mainframe data processing system in which copies are made of such data sets or files. Specifically, this patent discloses a method for copying a selected portion of a logical volume, called an “extent”, from a source device to a destination device in response to a copy command that can identify noncontiguous blocks of contiguous tracks in an extent that contain the data set or file. An extents track establishes an environment in which the data will be copied. A calling system receives an immediate response that the copy operation is complete even though no data has been copied. Application programs may then access the file in either of the source or destination devices. The copy program transfers the file on a track-by-track basis to the storage locations in the destination device in accordance with the information in the extents track. Procedures to assure that any data access to a particular track in either of the source or destination devices by any application prior to the transfer of that track are accommodated to maintain data integrity.
Recently open systems have been enabled to interact with large-scale data storage facilities as advances in networking and hardware capabilities have been made. Open systems, however, do not have commands with the capability of handling data on a track-by-track or extent basis. Moreover, logical devices in open systems tend to be smaller than logical devices in mainframe systems. Consequently, in open systems copies are still made on a logical device basis. U.S. patent application Ser. No. 10/705,772 filed Nov. 10, 2003, and assigned to the same assignee of this invention, describes how the process of U.S. Pat. No. 6,363,385 can be adapted to operate with open systems.
Both the mainframe system and open system implementations of this copying process are characterized by initializing a three-phase process that includes: (1) a “create” phase, (2) an “active copy” or “active” phase and (3) a “termination” phase. In these implementations controllers in disk adapters associated with each physical disk storage device implement the various phases in response to commands from a host or system manager. For example, in U.S. Pat. No. 6,363,385 a requesting host application issues a FILE SMMF command with a syntax that constitutes a “CREATE” command for initiating the “create” phase to allocate resources within the data storage facility necessary to establish an environment for the subsequent copy process. The syntax in the CREATE command may also initiate the “active” phase automatically. Alternatively, the syntax of the CREATE command may also terminate response of the “create” phase after the environment is established.
A FILE SMMF command with a different syntax constitutes an “ACTIVE COPY” command that, as previously indicated, may be included in the CREATE command or may be a separate command. The ACTIVE COPY command makes the destination logical device available to host applications and initiates the copying process.
A FILE SMMF command with still another syntax constitutes a “TERMINATION” command. The TERMINATION command is generated any time after all the data is transferred from the source device to the destination device. This command releases the resources allocated during the create phase so that the resources are available for other procedures.
This approach was acceptable because at that time the size of the data transfers were limited. Consequently the process did not adversely impact the operation of the disk storage facility and the host applications that accessed the disk storage facility. That is the copying operation was essentially transparent to host applications.
Over time customers for these data storage facilities increased their demands for performance in various areas. One such area has involved the generation of copies of tens or hundreds of logical devices at a given point in time. Customers use the copies for various purposes. Use of the copies for backup and for report generation represent two such purposes. With the foregoing systems an ACTIVE COPY command would have to issue for each logical device or group of one or more extents in a logical device to be copied. If the overall objective was to obtain a point-in-time copy from each of hundreds of source logical devices, the initial load as copy programs were initiated for each identified source logical device could impose unacceptable delays in the interaction between host applications and the data in the logical devices being copied.
More specifically, in the prior active phase, any attempt by the host to write to the source logical device imposed a certain overhead on the data storage facility for making a priority transfer if the data at a location had not been copied properly. Likewise, a corresponding overhead was imposed each time an application performed either a read or write operation with the destination logical device. With greater use of this copy facility, the time required for the copy process to complete tended to increase with the possibility of unacceptably long response times with respect to the primary purpose of the data storage facility to provide immediate responses to read and write requests from other application programs. In addition, the interval during which internal resources of the data storage facility were allocated to this process also increased.
The possibility of these unacceptable delays and response times has given rise to a need to improve the copy operation to overcome these issues. That is, there is a need to assure that the copy operation does not unduly burden the data storage facility so that the time that resources in the data storage facility are allocated to the copy task is minimized. There is a concomitant requirement that the copy program be improved to assure that essential transparency is retained even as the sizes and complexities of such copy operations continue to increase.