The disclosure relates generally to automated data storage systems and more particularly, to a method and computer program product for selectively performing a secure data erase.
A virtual tape system is a tape management system such as a special storage device or group of devices and software which manages data such that the data appears to be stored entirely on tape cartridges when portions of the data may actually be located in faster, hard disk storage. Programming for a virtual tape system is sometimes referred to as virtual tape server (VTS), although these terms may be used interchangeably, unless otherwise specifically indicated. A virtual tape system may be used with hierarchical storage management (HSM) system in which data is moved as the data falls through various usage thresholds to slower but less costly forms of storage media. A virtual tape system may also be used as part of a storage area network (SAN) where less-frequently used or archived data can be managed by a single virtual tape server for a number of networked computers.
In prior art virtual tape storage systems, such as International Business Machines (IBM) Magstar Virtual Tape Server, at least one virtual tape server (VTS) is coupled to a tape library comprising numerous tape drives and tape cartridges. The VTS is also coupled to a direct access storage device (DASD), comprised of numerous interconnected hard disk drives.
The DASD functions as a tape volume cache (TVC) of the VTS subsystem. When using a VTS, the host application writes tape data to virtual drives. The volumes written by the host system are physically stored in the tape volume cache (e.g., a RAID disk buffer) and are called virtual volumes. The storage management software within the VTS copies the virtual volumes in the TVC to the physical cartridges owned by the VTS subsystem. Once a virtual volume is copied or migrated from the TVC to tape, the virtual volume is then called a logical volume. As virtual volumes are copied from the TVC to a Magstar cartridge (tape), they are copied on the cartridge end to end, taking up only the space written by the host application. This arrangement maximizes utilization of a cartridge storage capacity.
The storage management software manages the location of the logical volumes on the physical cartridges, and the customer has no control over the location of the data. When a logical volume is copied from a physical cartridge to the TVC, the process is called recall and the volume becomes a virtual volume again. The host cannot distinguish between physical and virtual volumes, or physical and virtual drives. Thus, the host treats the virtual volumes and virtual drives as actual cartridges and drives and all host interaction with tape data in a VTS subsystem is through virtual volumes and virtual tape drives.
One issue of VTS systems is the management of data within the tapes. The VTS system may have a number of duplicate, invalid, latent or unused copies of data. After a virtual tape volume is created and/or modified (one or more records are written to the volume) and closed, the virtual tape volume is copied onto the physical tape (logical) volume. The image of the virtual volume copied to a physical volume when the virtual volume was closed is a complete version of the virtual volume at the point in time the virtual volume was closed. If a virtual volume is subsequently opened and modified, when the virtual volume is closed, that image of the virtual volume is also copied onto physical tape, however the virtual volume does not overwrite the prior version of the volume since the virtual volume may have a different size than the previous version. So at any point in time, there may be several versions of the same volume serial number that reside on one or more physical tape volumes.
Moreover, physical volumes within a VTS are arranged in groups that are called “pools,” with each physical volume including one or more logical volumes. Each of the physical volumes managed by the VTS system is assigned to one of 32 pools, for example. It is understood that each pool of physical volumes is assigned a name and may have one or more parameters associated therewith. For example, typical parameters associated with a pool include, but are not limited to: a media type (e.g. physical volumes having 10 Gbyte tape or 20 Gbyte tape); and a rule(s) for managing volumes in a pool. One rule may involve the concept of “reclamation” whereby the VTS monitors what percentage of data associated in a particular physical volume is still valid. That is, over time, data space occupied by a logical volume needs to be reclaimed from a physical volume when the data is no longer used or needed by the host, i.e., has expired. Thus, if any volume(s) in the pool falls below a reclaim percent threshold, then a reclamation process will be performed to take the valid logical volume(s) off the physical volume and put the valid logical volume on another physical volume—potentially combining multiple partially full physical volumes and filling up the other.
If a virtual volume is removed from the physical volume and put on to another physical volume, the data on the first physical volume is deleted but has not been overwritten, and thus, the data may be recovered. Further, data associated with the most current version of a virtual volume may be expired or considered latent or unusable by the customer, but the virtual volume still will exist on the physical tape volume and could be accessed.
Recently, enterprises have become more dependent on the ability to store, organize, manage and distribute data. Accordingly, “information life-cycle management,” the process of managing business data from conception until disposal in a manner that optimizes storage, access, and cost characteristics has become increasingly important. In particular, the significance of how data is “deleted” or disposed of has increased as confidential data has begun to play a more vital role in business transactions and stricter regulations are imposed on maintaining customer privacy.
To protect confidential or sensitive data (e.g., credit card information, social security number) and to maintain customer privacy it is advantageous to perform a secure data erase on certain data so that the data is unrecoverable. A secure data erase is defined herein rendering data permanently unreadable by any reasonable means. Prior art methods of prioritizing data to be securely erased are performed by a first in, first out priority basis, ensuring that the physical volume that was first added to the queue for secure data erase is secure data erased first. For example, in the prior art method the process to determine which physical volume should be secure data erased next looped through each pool of physical volumes beginning with pool 1 and continuing through all pools to the last pool (e.g. pool 32). Therefore, the list or queue of physical volumes to secure data erase is created by evaluating and listing all of the physical volumes within pool 1 to be secure data erased, then evaluating and listing all of the physical volumes within pool 2 to be secure data erased, and so on, continuing to the last pool (e.g. pool 32), until all of the physical volumes in all the pools that require secure data erase are evaluated and listed on the queue. The secure data erase process is then initiated, and with a first in, first out process the prior art process begins with a secure data erase of the physical volumes in the order that they are listed in the queue. Thus, the VTS performs a first secure data erase on those physical volumes that are in pool 1, then when the physical volumes within pool 1 are secure data erased, the VTS begins secure data erase on the physical volumes within pool 2, and so on, continuing until the process reaches the last pool (e.g. pool 32).
The prior art process as described is sufficient provided the VTS can manage all of the physical volumes to be secure data erased within the time remaining to the erasure deadline or there is no backlog of physical volumes to be secure data erased. In reality the VTS system can become overloaded and, for example, while a physical volume within pool 23 is being secure data erased a physical volume within pool 32 may pass its secure data erase (SDE) deadline. Therefore, physical volumes that have the shortest remaining time to the SDE deadline may be overlooked or postponed. This could put long-expired physical volumes which may contain sensitive data (e.g., credit card information, social security number) at risk for being accessed and retained.
Therefore, it would be advantageous to have a VTS system that gives priority for secure data erase to physical volumes closest to their SDE deadline. Thus, what is needed is a method and a system that guarantees that old or expired versions of a virtual volume cannot be accessed after a certain time interval (e.g. a grace period), through any reasonable means.