1. Field of the Invention
This invention relates in general to improvements in the field of computer systems having the capability of copying data from one storage element to another, and more particularly, to a method for implementing a rapid, efficient, high performance method of copying data.
2. Description of the Background Art
A data copy function in a computer system is typically used to save a recent version of data on a data storage device such as a disk drive, tape drive, or other storage device. The data copy function along with the storage devices form a backup/restore subsystem. One frequent use of this subsystem is to protect against loss of data. Data currently being processed can be destroyed, corrupted, inadvertently changed, or otherwise damaged as a result of problems such as power failure, hardware failure, or operator error. The backup/restore subsystem can greatly alleviate the effects of damaged data by reproducing a previous version of the data before the damage occurred. Other uses of data copying include improving efficiency of creating data bases, creating a common format of data files, and numerous other uses known to those familiar with file management programming.
Successful recovery or use of data requires that all of the data is copied at a consistent point in time. A consistent point in time means that any update of the data is inhibited during the copy process. One method of providing a copy of the data is to use a technique called snapshot. Snapshot is used, for example, in the IBM RAMAC Virtual Array product. Snapshot uses a log structure array (LSA) containing sets of pointers indicating the location of the data on the physical storage device. This snapshot technique does not require a physical copy of the data from one physical location to another. Snapshot instead uses pointers in an LSA to point to the same disk storage location for both the original and copied data. Pointer manipulation is usually much faster than physically copying the data. Future requests to write an updated version of either the original data or the copied data result in the updated data being written to a new physical storage device location.
One advantage of using a snapshot technique in a log structured array environment is the ability to replicate a copy of data across the entire subsystem through the replication of entries in the LSA. One track of a host volume could, using snapshot, be replicated across the entire set of volumes or devices addressable within a system such as the IBM RAMAC Virtual Array subsystem. This has considerable time saving advantages.
The snapshot technique has some significant disadvantages which are associated with the use of an LSA. For example, snapshot imposes an overhead on managing data because, as implemented with LSA in the IBM RAMAC Virtual Array, data compression, free space collection, and virtual space allocation are imposed. As a consequence of using a LSA, newly written data is always placed in new back end storage locations. As used herein, the term “back end storage” refers to the physical storage devices. Free space collection is a background activity that continuously occurs when using an LSA and consumes valuable CPU cycles within the storage subsystem. Virtual allocation of data always writes updates to a new back end location that has been made available through the free space collection process. Since there is a not a guaranteed storage location associated with every issued write request, the available disk storage may eventually be exhausted. This adds complication to the design and requires unique reporting mechanisms for alerting the user when disk storage is no longer available. To help overcome the possibility of running out of disk storage, data compression is used. Data compression helps to reduce the number of occasions when disk storage is not available, but does not completely solve the problem and adds significant design complications.
Snapshot therefore has some performance advantages; however, using an LSA also has some significant disadvantages.
Another method of copying data is called flashcopy. Flashcopy as implemented in the IBM Enterprise Storage Server uses an ‘update-in-place’ architecture and does not use an LSA. An update-in-place architecture places updated data in the same physical location as the original data. Flashcopy allows the copied data to be accessed by pointing to the locations of the original and copied data. Usually, a physical copy of the data is written to the target volume. Any new requested written update to the source data requires the undesirable overhead that the data is physically copied prior to the update taking place. In an attempt to minimize this write overhead penalty, the physical data copy operations are usually performed as a background task. The background activity of physically copying the data may be temporarily deferred depending on the priority of completing other tasks to help alleviate the impact on system performance.
What is needed is a fast, efficient method of copying data which has the speed of using pointers for copying, but does not degrade system performance with overhead tasks of data compression, free space collection, virtual space allocation, or interruptions from additional write requests.