1. Field of the Invention
This invention relates in general to computer-controlled processing of variable-cost actions and particularly to procedures for reclaiming data storage volumes in a multivolume data storage library.
2. Discussion of the Related Art
Memory compaction or "defragmentation", also denominated "garbage collection", is a necessary operation in any large database processing system. Memory compaction is a procedure whereby valid data scattered throughout the memory system are collected or "compacted" together, thereby freeing up unused memory space in larger contiguous sections. The gradual "fragmentation" of a data storage system is a normal result of data processing over time. Thus, "garbage collection" is performed routinely, either in "snapshots" at regular intervals or on a continuing basis.
Storage reclamation is a necessary procedure in every level of a hierarchical data storage system. For instance, a large data storage space such as described by Gelb et al. in U.S. Pat. No. 5,018,060, which is entirely incorporated herein by this reference, includes many different physical data storage devices for peripheral data storage. Such devices may include Direct Access Storage Devices (DASDs) employing high-speed magnetic disk technology, magnetic tape storage devices, optical disk storage devices and several types of solid-state random access memory (RAM). The slower of these storage devices generally provide the higher data storage capacity and, therefore, present the more challenging garbage collection problems.
In the present art, "garbage collection" procedures move data from one place to another to create large contiguous sections of available storage. In RAM, the data are moved in byte increments. In DASDs, the data are moved in track increments. Thus, for RAM or DASD garbage collection, the necessary actions have a constant "cost". However, for storage recovery in data storage libraries, organized in physical volumes (e.g., tape or optical disk), the data are moved in variable increments, depending on the available space in a volume as a fraction of volume data storage capacity. Herein, this is denominated a "variable cost" action. The reason for such cost variability can be appreciated by considering another distinction. In RAM or DASD data storage, empty space can be accumulated into arbitrarily large contiguous blocks, with the larger contiguous block being the more desirable. However, there is no operating advantage for freeing a contiguous data storage block larger than a single data volume in a multivolume data storage library system. Finally, RAM or DASD data storage permit reuse of invalid data storage areas by simply over-writing with new valid data. This is useful for either contiguous or discontiguous allocations of storage, as is known in the art. However, in a multivolume data storage library system, discontiguous storage of a single data block by allocation to several volumes is not a feasible storage scheme unless the block exceeds volume capacity. Thus, storage reclamation in a data storage library requires collection of valid data blocks or "fragments" to release empty volumes for reuse. This collection of valid data is a variable-cost action that is a function of the storage conditions in the particular volume that is to be released and is herein denominated "recycling".
For instance, a multi-volume library system, such as the IBM 3495 Tape Library Data Server or the 3995 Optical Library Data Server, includes a large number of individual volumes or cartridges and a smaller number of drives for reading and writing data from and to selected volumes. As system operating time passes, the fraction of each initially active library volume containing valid data generally declines, eventually to zero. That is, the data library volumes accumulate "empty" or unused space over time. Because most older volumes still retain valid data occupying a few percent of volume capacity, the valid data must be transferred to a "compacted" target volume before the older volume can be recycled for use as an empty volume. This transfer and compaction of valid data from "low-density" source volumes to "compacted" target volumes in such a data library is the process required to reclaim empty data storage volumes for reuse. Even empty volumes requiring no data transfer must be actively released for reuse.
In concept, storage reclamation in a multivolume library system is uncomplicated. The user first specifies the number of volumes to be reclaimed, say 300. The system then proceeds to compact valid data by mounting, transferring and recycling the library volumes in some sequence until the desired number of empty volumes are released.
The user may specify a maximum value for the percentage of valid data permitted in a source volume selected for recycling. By limiting the occupied "fraction" or percentage of valid data in a volume, the required number of recycled volumes can be obtained with fewer mounts and transfers. For instance, if only volumes having no more than, say, 40% valid data are recycled, no more than 300/0.60=500 source and 200 target volumes need be mounted (700 mounts altogether) and processed to obtain the 300 empty volumes desired.
As another example, consider a data storage library where volumes having no more than 25% valid data are recycled to reclaim 300 free volumes. Thus, 300/0.75=400 source and 100 target volumes must be mounted and processed. Therefore, the same 300 volumes are released with only 500 total mounts instead of 700. This illustrates the variability of processing cost, which is affected by the average source volume fraction occupied by valid data.
Unfortunately, a simple naive source volume recycling procedure is painfully slow and inefficient when applied to one or more large data libraries having, for instance, in excess of 100,000 volumes. A better idea would be to process and recycle the emptiest volumes (the volumes having the lowest percentage of valid data occupancy) first, thereby minimizing the number of mounts necessary to produce a given number of recycled empty volumes. This sort of improvement is not motivated in the art because no net savings are realized when an entire library is processed for recycling. The idea gains importance in a library where volumes are reclaimed under a time limit or as a specified number of recycled volumes, however. For instance, in a large data storage tape library having in excess of 100,000 volumes, storage reclamation time for only 300 free volumes can extend beyond 24 hours. If a storage reclamation procedure is initiated daily, garbage collection may never be completed in such a system.
Thus, there is a clearly-felt need in the art for an optimized storage reclamation procedure for use in large multivolume data storage libraries. Such an optimized procedure should maximize the storage capacity reclaimed over a given time interval. Even scanning and presorting over 100,000 volumes into a processing queue ordered by the fraction of volume capacity that is occupied with valid data (that is, by the variable cost for each volume) is not a complete solution to this problem. This is because searching and sorting through over 100,000 volumes itself requires a substantial time interval during which no storage reclamation occurs. On the other hand, the naive "brute-force" method known in the art is also wasteful of recycling capacity because the recycled volumes are selected without considering relative processing cost in terms of the amount of empty space available in each recycled volume.
Although the storage reclamation optimization problem is reminiscent of the query optimization problem known in the database processing art, available query optimization techniques are not helpful in solving the library storage reclamation optimization problem, mainly because query processing elements do not generally include variable-cost actions. For instance, in U.S. Pat. No. 5,089,985, Chang et al. consider the essential dilemma posed in a data processing system by holding the list processing capacity idle during a sorting procedure. Chang et al. suggest sending sorted data to the user as soon as the first data are sorted into final sort order instead of awaiting completion of the entire sort. However, in a library storage reclamation procedure, the volume scanning takes more time than the sort and the final sort order of any particular volume cannot be determined until all volumes have been scanned at least once. Thus, the Chang et al. method suggests no improved solution to the library storage reclamation optimization problem.
Another method in the existing art for library storage reclamation is to screen the entire library with a predetermined "percent-valid" (recycle cost) threshold, processing all volumes that meet the threshold test. After completion of the first library screening and processing step, the screening threshold is then adjusted upward to a second predetermined value and the entire library is again screened and processed to release more volumes. This iterative procedure continues until the desired number of volumes have been released. While this method is somewhat more efficient than the naive method discussed above, it requires multiple passes and affords no opportunities for optimizing the fixed screening thresholds.
Other practitioners discuss similarly unhelpful database query optimization points in U.S. Pat. Nos. 5,091,852, 5,020,019, 4,510,567, and 4,587,628, and neither consider nor suggest a solution to the variable-cost reclamation optimization problem encountered in large multivolume data storage libraries. These unresolved problems and deficiencies are clearly felt in the art and are solved by this invention in the manner described below.