This invention relates to computer systems and a high performance, high reliability parallel disk drive array data storage subsystem that includes an efficient data storage management system to dynamically map virtual data storage devices to logical data storage devices, and that includes a deleted data file space release system that releases the physical space occupied by data that is scratched by a host processor and, in particular, to apparatus for the temporary preservation of physical space occupied by previously modified or deleted data stored in the data storage subsystem, and for the temporary preservation of associated virtual to logical mapping table entries that describe and locate such temporary physical space. The invention further relates to apparatus for the recording of the occurrence and the timing of changes to mapping table entries and to the processing of such recorded changes in a reverse time sequence to recover previously modified or deleted data, or in a forward time sequence to return data to a more current state.
It is a problem in the field of computer systems to restore data that has been accidentally or intentionally modified or deleted. To insure that data can be restored after such events, copies of data, or xe2x80x9cbackupsxe2x80x9d, are typically stored on disk storage subsystems or magnetic tape media located at a primary business location. Various combinations of hardware and host software functionality are available to help restore data, but typically there is a time gap between the time the last backup was made and the time the damage to the data occurred. If the missing data can not be recreated from other sources, the loss of data is permanent. In any case, negative economic consequences result from the loss of data or the effort to recreate or restore the data.
It is also a problem in the field of computer systems to insure that closely related sets of restored data are consistent with each other. To insure that data can be restored after a natural disaster at a primary business location, copies of data are typically stored on disk storage subsystems at one or more geographically different secondary locations that are connected to the primary location through communications connections. As data is changed at a primary location, a combination of hardware and host software functionality causes these changes to be automatically propagated to the secondary locations through the communications connections, and stored on the disk storage subsystems at the secondary locations. When communications connections between locations are interrupted, or one or more locations experience local disasters, a problem of inconsistent data can be created. Data is inconsistent when one copy of a data set is not identical to another copy, or when a sequence of changes to two or more closely related primary data sets are not fully propagated to all other secondary copies. If a disaster at a primary location occurs over a period of many seconds to minutes, it is vital to know when to stop propagating changes to the secondary locations. The IBM Peer-to-Peer Remote Copy and Geographically Dispersed Parallel Sysplex solutions provide a combination of hardware and host software functionality to detect early warning signs of trouble, and then automatically and simultaneously stop propagating changes to secondary locations, however, a problem with these types of solutions is that they may not recognize in sufficient time that a disaster has started. Partially changed data may therefore be propagated to the secondary locations, and so render the data inconsistent.
The above-described problems are solved and a technical advance is achieved in the field of computer systems by the temporary preservation of physical space occupied by previously modified or deleted data stored in the data storage subsystem, by the temporary preservation of associated virtual to logical mapping table entries that describe and locate such temporary physical space, by the recording of the occurrence and the timing of changes to mapping table entries (Change Recording), and by the processing of such recorded changes (Change Processing) in a reverse time sequence to recover previously modified or deleted data, or in a forward time sequence to return the data to a more current state.
As in prior art, all new or modified data is written on empty logical tracks and the associated previously modified data is marked as obsolete. When functional space is released by the deleted data file space release system, the deleted data is also marked as obsolete. The resultant xe2x80x9cholesxe2x80x9d in the logical tracks caused by previously modified or deleted data are removed by a periodic background process known as free space collection. The background free space collection process creates empty logical cylinders by collecting valid data tracks into previously emptied logical cylinders.
In the present invention, Change Recording may be activated or deactivated for one or more functional volumes, for one or more ranges of functional tracks that describe data sets, and for one or more individual functional tracks. When Change Recording is active, all new or modified data for those active volumes and tracks is written on empty logical tracks, but the associated previously modified data is marked as unexpired instead of obsolete. When functional space is released by the deleted data file space release system, the deleted data is also marked as unexpired instead of obsolete. The resultant xe2x80x9cholesxe2x80x9d in the logical tracks caused by the previously modified or deleted data are preserved until expired by the background free space collection process.
As the background free space collection process examines each unexpired track, a check is made to determine if Change Recording is active for the unexpired track, and if not, existing prior art processes are followed. If Change Recording is active for the unexpired track, the timestamp value in the associated virtual to logical mapping table entry for the unexpired track and the current level of free space in the parallel disk drive array data storage subsystem are used in a method to dynamically determine if the unexpired track should be expired and collected as free space, or should remain unexpired and preserved. As tracks are expired, so is the related Change Record. As the level of free space decreases, the amount of data tracks available for preservation of unexpired data decreases, and thus the maximum preservation time for unexpired data decreases. Conversely, as the level of free space increases, so does the maximum preservation time for unexpired data. The present invention provides for a plurality of methods for dynamically determining the preservation time for unexpired data, such as a function of the available free space or total disk space, or as a variable amount of time based on some internal or external parameter.
When Change Recording is activated, all new or modified data for those active volumes and tracks cause Change Records to be generated and stored on the parallel disk drive array data storage subsystem. The location of old and new data tracks associated with each change is saved in a Change Record. A timestamp value is saved in each Change Record and in the associated virtual to logical mapping table entries for the old and new data tracks. The timestamp value is determined by the parallel disk drive array storage subsystem at the time the change completed. For one or more host processors accessing one or more parallel disk drive array storage subsystems, all host processors and parallel disk drive array storage subsystems are synchronized to a common external time reference to produce a consistent temporal record of change activity. When Change Recording is deactivated, generation of Change Records stops, but a timestamp value continues to be saved in the associated virtual to logical mapping table entries for new or modified data.
Change Processing locates the Change Records relevant to the request, orders the records either in reverse time sequence to restore data to a previous known point of consistency, or in forward time sequence to return data to a more current state, and then processes each Change Record using information such as the locations of the old and new data tracks, to modify the virtual mapping table entries until the desired point in time is reached.