1. Field of the Invention
A method, system, program, and data structures for maintaining electronic data at a point-in-time and, in particular, copying point-in-time data from a first storage location to a second storage location.
2. Description of the Related Art
Data storage systems often include a feature to allow users to make a copy of data at a particular point-in-time. A point-in-time copy is a copy of the data consistent as of a particular point-in-time, and would not include updates to the data that occur after the point-in-time. Point-in-time copies are created for data duplication, disaster recovery/business continuance, year 2000 testing, decision support/data mining and data warehousing, and application development and testing.
One data duplication technique for copying a data set at a particular point-in-time is the International Business Machines Corporation""s (xe2x80x9cIBMxe2x80x9d) Concurrent Copy feature. Concurrent Copy performs back-up operations while allowing application programs to run. Concurrent Copy insures data consistency by monitoring input/output (I/O) requests to the tracks involved in the Concurrent Copy operation. If an I/O request is about to update a track that has not been duplicated, then the update is delayed until the system saves a copy of the original track image in a cache side file. The track maintained in the side file is then eventually moved over to the target copy location. Concurrent Copy is implemented in a storage controller system, where the storage controller provides one or more host systems access to a storage device, such as a Direct Access Storage Device (DASD), which is comprised of numerous interconnected hard disk drives. With Concurrent Copy, data is copied from the DASD or sidefile, to the host system initiating the Concurrent Copy operation, and then to another storage device, such as tape back-up.
Concurrent Copy is representative of a traditional duplication method in which the source data to copy is read from the disk into the host. The host then writes a duplicate physical copy back to the receiving disk. This method uses substantial processing cycles to perform the I/O operations for the copying and disk storage, and can take considerable time. In fact, the amount of time and resources consumed are directly proportional to the amount of data being copied. The larger the size of the data, the more resources, and time, used. Further details of the Concurrent Copy operation are described in the IBM publication, xe2x80x9cImplementing Concurrent Copy,xe2x80x9d IBM document no. GG24-3990-00, (IBM Copyright, December 1993), which publication is incorporated herein by reference in its entirety.
Another data duplication technique for storage controller systems is the IBM SNAPSHOT** program. SnapShot is intended for use with the IBM RAMAC Virtual Array or ICEBERG disk storage systems.** Such systems provide a virtual disk architecture, also referred to as Log Structured Array (LSA) system, in which mappings provide virtual locations of the data. LSA tables map host tracks to disk array storage locations where the data is stored. When data is written to the system, it is compressed and compacted, assembled into fixed blocks, and written to the DASD. All write operations in virtual disk architecture are always directed to a new place in the disk array. SnapShot operates by copying the LSA pointers to the data, and not copying the actual data. Thus, after a SnapShot copy is made, there are two sets of pointers to the same data. Further details of the SnapShot operation are described in the IBM publications xe2x80x9cImplementing Snapshot,xe2x80x9d IBM document no. SG24-2241 (IBM Copyright, November 1997); xe2x80x9cUsing RVA and SnapShot for Business Intelligence Applications with OS/390 and DB2, xe2x80x9cIBM document no. SG24-5333-00 (IBM Copyright, August 1998).
SnapShot is considered advantageous over traditional disk copy operations, such as Concurrent Copy methods. SnapShot uses substantially less disk space and I/O processing. Further, SnapShot requires substantially less time to make the copy than traditional disk copy operations because SnapShot just makes a copy of the logical pointers to the data being copied, and not a physical copy of the data. Eliminating I/O operations to copy the actual data, allows a SnapShot copy to be completed almost instantaneously. Once the pointers are copied, the SnapShot copy is complete.
Although, SnapShot has proved to be an advantageous program for point-in-time copies, SnapShot only operates on a virtual disk architecture, such as an LSA architecture, where tables of pointers to the data are maintained and available for duplication.
There is thus a need in the art to provide improved point-in-time copy methods to systems that do not have an LSA type virtual array, as well as LSA type data systems that utilize virtual arrays.
To overcome the limitations in the prior art described above, preferred embodiments disclose a method, system, program, and data structures for maintaining electronic data at a point-in-time. A first data structure indicates point-in-time data at one of a first storage location and a corresponding second storage location. A second data structure indicates point-in-time data at one of a first storage location and corresponding second storage location. A first relationship data structure indicates a relationship between the first storage location and corresponding second storage location and a second relationship data structure indicates a relationship between the first storage location and second storage location. A request to process the first storage location is processed by processing the first relationship data structure to determine the corresponding second storage location for the first storage location and processing the first data structure to determine whether data at the first storage location was transferred to the second storage location. A request to process the second storage location is processed by processing the second relationship data structure to determine the corresponding first storage location for the second storage location and processing the second data structure to determine whether the point-in-time data at the first storage location was transferred to the second storage location.
In further embodiments, the point-in-time data is copied from the first storage locations to the second storage locations. In such case, the first and second data structures are modified to indicate that point-in-time data copied from the first storage location to the second storage location is not located at the first storage location and is located at the second storage location.
In still further embodiments, the point-in-time data is not copied from the source to the target unless an update is made to a first location including the point-in-time data. In such case, the point-in-time data to update at the first storage location is copied to the corresponding second storage location. The first and second data structures are modified to indicate that the point-in-time data to update is at the second storage location. Further, in preferred embodiments, data is copied from the source to target by copying the data from the source location in cache to the target location in cache. Upon destage, the point-in-time data at the target location in cache is transferred to the target storage device.
The preferred embodiment data structures are used to establish the location of point-in-time data that is copied from a first or source location to a second or target location. With preferred embodiments, after setting up the data structures, the data does not have to be copied unless source data is updated or data is requested from the source location. In such case, the point-in-time data is copied over to insure that a copy of the point-in-time data is maintained. The transfer of data preferably involves the movement of data to cache and then to the target location. The preferred embodiments thus avoid I/O operations to physically copy all the data from the source to target locations. With preferred embodiments a relationship between the source and target locations is established to allow the system to operate as if a copy of the point-in-time data is maintained at the target locations.