It is important in a great number of situations to be able to obtain a point in time copy of a data system. The term “point in time” is used to mean that a copy of a data set is taken that is consistent across the data set at a given moment in time. Data cannot be updated whilst it is being copied as this would result in inconsistencies in the copied data.
Point in time copies are useful in a variety of situations. Applications include but are not limited to: obtaining a consistent backup of a data set, taking an image of a long running batch processing job so that it may be restarted after a failure, applications testing etc.
In order to create a point in time copy in a data processing system, the flow of writes to the data set(s) being copied must be interrupted so that no updates occur for the duration of the copy operation. Interrupting the flow of writes is likely to mean that the data processing system is unavailable for processing transactions from client applications during the point in time copy operation. A very large proportion of systems now run on a 24 hour basis and an interruption of this form is unacceptable.
The time taken for a copy to be created or the elapsed time that the system is unavailable needs to as small as possible. An ideal system would perform the point in time copy in a time short enough to be tolerated by the client application or user. One way to measure this would be to look at the transaction timeout. The transaction timeout will vary depending upon the system; some common examples are the timeout within a web browser or the time a typical user will wait before attempting to cancel or backout a transaction. Typically this is over the order of seconds or tens of seconds.
A number of technologies are known for implementing point in time copies. U.S. Pat. No. 5,410,667 describes one well known technique used in storage subsystems implementing a ‘Log Structured Array’ (LSA) such as IBM Corporation's RAMAC Virtual Array (RVA) referred to as “Snapshot Copy”. EMC Corporation's “Timefinder” product uses a simpler technique which is applicable to non LSA subsystems. Other implementations include “Flash Copy” on the IBM Enterprise Storage Server (ESS). There are in fact quite a number of different methods for implementing point in time copy, all of which share the necessity to interrupt the flow of updates to the data sets whilst the copy is established.
A discussion of LSAs is given in “A Performance Comparison of RAID 5 and Log Structured Arrays”, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing, 1995, pages 167–178, in addition to U.S. Pat. No. 5,410,667 which discusses LSAs in the context of snapshot copy.
Snapshot copy in a Log Structured Array (LSA) is now described as one example of point in time copy technology. Snapshot copy allows extents of the virtual storage space to be copied just by manipulating meta-data without the relatively high overhead of actually moving the copied data.
An LSA has an array of physical direct access storage devices (DASDs) to which access is controlled by an LSA controller having an LSA directory which has an entry for each logical track providing its current location in the DASD array. The data in the LSA directory is meta-data, data which describes other data. Snapshot copy describes a system by which the LSA directory is manipulated so as to map multiple areas of the logical address space onto the same set of physical data on the DASDs. This operation is performed as an “atomic event” in the subsystem by means of locking. Either copy of the data can subsequently be read or be written to without affecting the other copy of the data, a facility known as copy on write.
Snapshot copy employs an architecture with virtual volumes represented as a set of pointers in tables. A snapshot operation is the creation of a new view of the volume or data set being “snapped” by coping the pointers. None of the actual data is accessed, read, copied or moved.
Snapshot copy has several benefits to the customer: (1) It allows the capture of a consistent image of a data set at a point in time. This is useful in many ways including backup and application testing and restart of failing batch runs. (2) It allows multiple copies of the same data to be made and individually modified without allocating storage for the set of data which is common between the copies.
Throughput of the copy operation can be very high since little data needs to be transferred. Copied areas which are not subsequently written can share the same physical storage thus achieving a kind of compression.
Flash copy is now described as a non LSA example of point in time copy technology.
In a system implementing flash copy a source local volume may be flash copied to a destination volume. After the flash copy operation is executed, the destination volume behaves as if it had been instantaneously copied from the source volume at the instant that the flash copy was executed. The flash copy is implemented in a mapping layer which exists logically between the volumes and the underlying physical volumes. The mapping layer uses a data structure to record which parts of the source volume have actually been copied to the destination and uses this information to direct reads and writes accordingly. Reads to the destination are inspected to determine whether any part of the read data has yet to be copied. In the event that some part has yet to be copied, that part of the read data is delivered from the source volume. Writes to the source are inspected to determine whether they touch an uncopied area of source. In the event that they do, the source volume is copied to the destination volume prior to writing the source, preserving the view that the destination was really copied at the point in time that the flash copy was executed. Writes to an uncopied area of the destination result in the data structure being updated to show that no copy is now necessary for that region of the volume.
Another method is similar to the above method, but instead of copying data to the same place on a destination volume as it is on the source volume, it writes to a journal that describes the data. Less storage space is needed in this method on the destination storage and therefore it is cheaper.
The above methods are all methods of taking point in time virtual copies of a data set.
Customers of storage arrays are often concerned with reliability, access times, and cost per megabyte of data stored. LSA and RAID storage subsystems provide ways of addressing the reliability issue and access requirements. Access time is improved by caching data. A cache is a fast random access memory often included as part of a storage subsystem to further increase the I/O speed. In the case of an LSA, a write-back cache is usually provided in the LSA controller.
A cache stores information that either has recently been requested from the DASDs or that needs to be written to the DASDs. The effectiveness of a cache is based on the principle that once a memory location has been accessed, it is likely to be accessed again soon. This means that after the initial access, subsequent accesses to the same memory location need go only to the cache. Much computer processing is repetitive so a high hit rate in the cache can be anticipated.
For improved performance, storage subsystems make use of a write-back cache, sometimes known as a fast write cache, for write I/O transactions where data is written in electronic time and completion given to the initiator of the I/O transaction before the data is actually destaged to the underlying storage. Write caching has a strong affinity with point in time virtual copy techniques. Delaying writes to the source volume whilst the underlying data is read from the source and written to the destination adds significant latency to write operations and therefore a write-back cache is useful to isolate the host application from this added latency.
The write-back cache is best placed logically between the I/O initiator and the component implementing point in time copy. For example, in the case of an LSA, the write-back cache sits between the host processor and the LSA directory. If the cache is placed logically between the component implementing point in time copy and the underlying physical storage then it is unable to isolate the host application from the additional latency introduced from writes to the uncopied area as explained above.
The problem is that if the write-back cache is in-between the I/O initiator and the component which manages the point in time copy meta-data then the point in time copy meta-data cannot be manipulated until the data in the write-back cache for both the source and destination regions of the point in time copy has been flushed and flushed or invalidated respectively.
This is a problem because with a conventional write-back cache the source region of the point in time must be flushed and the destination region flushed or invalidated before the point in time copy can be processed and completion given to the host. While the point in time copy operation is in process, no new writes can be accepted. This flushing operation takes mechanical time for each write with current disk technology. If there is a large amount of undestaged data in the cache, the point in time copy cannot complete until the data is destaged or invalidated. The time taken depends upon the quantity of undestaged data in the cache, which could be large, and the rate at which it can be destaged to the underlying disks, which could be low. With a large amount of undestaged data and a low destage rate the time taken to flush the cache could be minutes to tens of minutes.
If flushing is needed as part of the point in time copy, the point in time copy may take a long time. When a point in time copy is taken of an operational system, such as a database, the system is normally unavailable while the point in time copy is done, therefore a delay due to flushing is undesirable. The difference between electronic time and disk writing time for destaging a number of writes to disk is very significant.