1. Field of the Invention
The present invention relates to a system, method, and program for releasing storage space in a storage system where updates to a primary storage device are shadowed in a secondary storage device.
2. Description of the Related Art
Disaster recovery systems typically address two types of failures, a sudden catastrophic failure at a single point in time or data loss over a period of time. In the second type of gradual disaster, updates to volumes may be lost. To assist in recovery of data updates, a copy of data may be provided at a remote location. Such dual or shadow copies are typically made as the application system is writing new data to a primary storage device. International Business Machines Corporation (IBM), the assignee of the subject patent application, provides two systems for maintaining remote copies of data at a secondary site, extended remote copy (XRC) and peer-to-peer remote copy (PPRC). These systems provide a method for recovering data updates between a last, safe backup and a system failure. Such data shadowing systems can also provide an additional remote copy for non-recovery purposes, such as local access at a remote site. These IBM of XRC and PPRC systems are described in IBM publication xe2x80x9cRemote Copy: Administrator""s Guide and Reference,xe2x80x9d IBM document no. SC35-0169-02 (IBM Copyright 1994, 1996), which publication is incorporated herein by reference in its entirety.
In such backup systems, data is maintained in volume pairs. A volume pair is comprised of a volume in a primary storage device and a corresponding volume in a secondary storage device that includes an identical copy of the data maintained in the primary volume. Typically, the primary volume of the pair will be maintained in a primary direct access storage device (DASD) and the secondary volume of the pair is maintained in a secondary DASD shadowing the data on the primary DASD. A primary storage controller may be provided to control access to the primary DASD and a secondary storage controller may be provided to control access to the secondary DASD.
Data in the primary and secondary volumes in the IBM XRC systems may be stored in a Log Structured Array (LSA) format in which mappings provide virtual locations of the data sets. A table referred to as a Volume Table of Contents (VTOC) that provides an index mapping data sets (e.g., files, records, etc.) to logical addresses or locations on the DASD. An LSA volume further includes an Internal Track Mapping Table (ITMT) that maps the virtual addresses to disk array storage locations where the data is stored. When data is written to the system, it is compressed and compacted, assembled into fixed blocks, and written to the DASD. All write operations in virtual disk architecture are always directed to a new place in the disk array. In this way, even if data is written to the same virtual location, on disk, the data is written to a new free storage location and the virtual location is updated to point to the new location where the data is written. Non-LSA systems include a VTOC that maps the volume data sets or files to physical locations, e.g., cylinder-head-record (CCHR), in the DASD, but do not include an ITMT as the data sets map directly to physical locations.
In prior art XRC systems, write operations to the primary volume are transferred to the secondary controller to write to the secondary volume in the same sequence that the writes are applied to the primary volume to ensure write sequence integrity. In both LSA and non-LSA volumes, when a data set is deleted by a host system, the VTOC entry for that data set is updated to indicate that the space is no longer in use. In IBM LSA storage systems, a deleted data space release (DDSR) program runs on the primary LSA volume to determine all virtual addresses invalidated in the VTOC. The DDSR program would then call the DISCARD command to update the ITMT for the primary volume to nullify the pointers from the virtual addresses nullified in the VTOC to physical locations in the primary DASD.
The DISCARD command specifies a range of nullified virtual addresses determined from the VTOC to invalidate in the ITMT. The DISCARD command frees the physical storage space addressed by those nullified virtual addresses to make available for future allocations to virtual addresses. Any future attempt to access the deleted virtual address would result in a return of a null pointer, as the virtual address was invalidated. The DDSR program may run periodically on LSA volumes to free the storage locations addressed by invalidated pointers or after the VTOC is updated. Non-LSA systems do not need a DDSR program or DISCARD command as such volumes do not utilize the ITMT table to maintain a virtual to physical mapping.
In prior art XRC systems where the secondary volume is an LSA system, the DDSR operation is not performed on the secondary LSA volume to update the secondary ITMT in order to preserve write sequence integrity concerns. For instance, if the DDSR operation processes the secondary VTOC to determine ITMT entries to invalidate after data is written to the secondary DASD and before the secondary VTOC is updated, then the DDSR operation would not recognize that the data written to the secondary DASD not yet reflected in the secondary VTOC is a valid data set as there is no pointer in the VTOC to the just updated storage location in the secondary DASD. In such case, the DDSR operation may issue a DISCARD command to free the storage locations in the secondary DASD indicated in the secondary ITMT just updated. When the secondary VTOC is subsequently updated to provide pointers from the updated data sets to virtual addresses, the virtual addresses in the secondary ITMT would no longer point to the updated data in physical storage, which was erased by the intervening DDSR operation. In such case, the new updates on the secondary DASD may be erased, thereby eliminating the shadow copy of the updates.
Due to the above data integrity concern, in prior art remote copy systems, data remains on the secondary LSA volume even when the virtual addresses in the secondary VTOC are invalidated to avoid invalidating the secondary ITMT mapping of virtual addresses to the physical storage locations of the new updates not yet reflected in the VTOC. Thus, in the prior art, the DDSR operation is not performed on secondary volumes in XRC systems. Because the secondary storage locations are not freed, the primary LSA volume has more available space then the secondary LSA volume as the storage locations including the discarded data at the primary volume are freed while the storage locations at the secondary LSA volume are not freed. Due to this situation, the secondary DASD may run out of storage space before the primary DASD.
Moreover, in the prior art, if the secondary volume is a non-LSA volume, then the DISCARD operation performed on the LSA primary volume cannot be transferred to the secondary non-LSA volume as there is no secondary ITMT to which the DISCARD command would apply. In the prior art, the data discarded at the primary volume remains on the secondary non-LSA volume. This process poses a security concern as data that was discarded at the primary volume remains available and accessible on the secondary volume.
For these reasons, there is a need in the art for improved techniques for managing data clean-up operations in a system where a secondary volume is used to shadow updates to a primary volume.
Provided is a method, system, and program for releasing storage space in a first and second storage devices. Updates to the first storage device are copied to the second storage device to provide secondary storage for the updates. A first and second tables map data sets to addresses in the first and second storage devices, respectively. A first command is detected to invalidate data sets in the first table. The addresses in the first table comprise virtual addresses, and a third table provides a mapping of the virtual addresses to physical storage locations in the first storage device. A second command is generated to update the second table to invalidate the data sets in the second storage device invalidated in the first table by the first command. A third command is detected to invalidate the virtual addresses in the third table used by the data sets invalidated in the first table to free the physical storage locations in the first storage device pointed to by the invalidated virtual addresses. A fourth command is generated that is directed to the physical storage locations in the second storage device used by the invalidated data sets.
In further embodiments, the addresses in the second table comprise virtual addresses, and a fourth table provides a mapping of the virtual addresses to physical storage locations in the second storage device. In such case, the fourth command updates the fourth table by invalidating the virtual addresses in the fourth table used by the data sets invalidated in the second table by the second command to free the physical storage locations in the second storage device pointed to by the invalidated virtual addresses.
In still further embodiments, the second table maps data sets to physical storage locations in the second storage device. In such case, generating the fourth command comprises generating at least one erase command to overwrite the physical storage locations in the second storage device that store the invalidated data sets.
Certain of the described implementations provide a technique for discarding data at the secondary storage device when a discard operation is performed with respect to virtual addresses at the primary storage device that point to physical storage locations in a manner that ensures data integrity. In this way, space is freed at the second storage device when the discard operation is performed on the primary storage device. In additional implementations, if the secondary storage device does not provide for virtual addressing, then erase commands may be generated in response to a discard operation at the primary storage device to overwrite the corresponding addresses in the secondary storage device.