1. Field of the Invention
The present invention relates to a system for improving capacity utilization of storage devices.
2. Description of the Related Art
An explosion of computer data and information, e.g., video, sound, pictures, etc., requires an ever increasing amount of computer readable storage space. Increasing data storage capacity requires improved storage management systems to backup and protect data sets, and migrate less active data sets to secondary storage to increase primary storage space. A data set consists of any collection or grouping of data. In certain systems, a data set may include control information used by the system to manage the data. The terms data set and file are generally equivalent and sometimes are used interchangeably. Hierarchical storage management (HSM) programs manage storage devices, such as tape libraries, to control the flow of data between primary and secondary storage facilities. Two important HSM procedures are migration and recall. The migration procedure transfers the least frequently used data sets from primary to secondary tape storage. If a user wants to access migrated data sets, then the recall procedure retrieves migrated data from the secondary storage, e.g., tape, to the primary storage device, such as a local hard drive or group of direct access storage devices (DASDs). Currently, magnetic tape is the preferred media for backups and secondary storage. In the future, optical and holographic storage devices may supplant magnetic tape.
Users often want data immediately. Such an immediate need for data creates a conflict if a user wants access to a data set that is located on a tape currently involved in a migration or backup operation. To provide immediate access to such tape, the HSM system would have to interrupt the migration to allow the user to recall the data set from the tape. Otherwise, the user would have to wait until migration completed. Wait times can be considerable given that current magnetic tapes can take several hours to fill entirely.
If the migration procedure is interrupted to allow the user to recall data, then the tape will be taken away from migration for read operations. As a result, the recalled tape is likely only partially filled with data. As tape size increases, so does the likelihood that a user will need to recall a data set on a tape involved in a migration procedure. Increasing the number of interruptions to migration operations to service recall requests increases the number of partially filled tapes as discussed below.
This tape capacity utilization problem is exaggerated because during migration/recall type operations, partially filled tapes cannot be used to complete a migration after a tape has been filled prior to completely migrating a data set. For instance, when a tape is filled in the middle of migrating a data set, only a blank tape can be used to store the remainder of the migrated data set. A partially filled tape includes no marker indicating empty portions of the tape. Thus, a data set cannot be completed in the middle of a partially filled tape because there is no marker to indicate where in the partially filled tape the remainder of the data set is placed. Using an empty tape to complete the migration of a data set creates another partially filled tape.
Tape data set stacking software seeks to improve tape capacity utilization. Some products involve hardware solutions to improve tape utilization, such as the International Business Machine (IBM) Corporation's Magstar Virtual Tape Server. The Virtual Tape Server employs a cache of DASD devices which appear as tape devices to the user. Data is backed up from the virtual tapes, i.e., the DASD devices, to the tape library in a manner that maximizes tape capacity utilization.
The IBM Data Facility Storage Management Subsystem (DFSMS) implemented in the IBM Multiple Virtual Storage (MVS) operating system provides two techniques for increasing tape capacity utilization. The Tape Mount Management (TMM) procedure of DFSMS involves routing data sets to a DASD pool, called the buffer. The DFSMS software checks the DASD pool and automatically migrates files from the DASD pool to tapes. DFSMS further includes a recycling operation. When a tape is taken away during a migration or recycle operation by a recall request, the tape is marked as full. Subsequently, the tapes marked as full are gathered and the data in the tapes is recycled into a smaller set of tapes. For instance, if two tapes marked as full are 10% and 25% occupied with data, the DFSMS program will merge the data contents from these tapes into a single tape that is 35% filled.
The recycling process takes tape resources off-line from normal HSM operations. At some point, at least two tape drives must be set aside to merge multiple input tapes into a single output tape. A table of tapes marked as full, containing both valid and invalid data sets, is built. Data sets are invalidated over time by the expiration or subsequent recall of migrated data. The valid data sets from the full tapes are then merged into an output tape. Those filled tapes containing the least amount of valid data are typically recycled first. Moreover, recycling operations take place at predetermined intervals. Between these recycle periods, numerous tapes could become partially filled, marked as filled, and set aside.
Relying on recycling to increase tape capacity utilization requires that tape drives be taken away from regular input/output (I/O) operations and dedicated to recycling operations. Further, additional processing power must be taken off-line to handle the recycling. Moreover, recycling does nothing to limit the continually expanding number of partially filled tapes not marked as full during normal operations.