The present invention relates generally to a method, system, and computer program for tape drive data reclamation. More particularly, the present invention relates to a method, system, and computer program for shortening a period of time during which two drives are simultaneously used during data reclamation.
Hierarchical storage management (HSM) is a known technology that realizes efficient use of a limited storage capacity. HSM is a scheme for arranging data that is frequently referred to in a high-speed and high-cost primary storage unit, such as a redundant array of independent disks (RAID) and a solid state drive (SSD), and arranging data that is referred to less frequently in a low-speed and low-cost secondary storage unit. HSM may be implemented in, for example, IBM® products such as the TS7700 Virtualization Engine and IBM® Spectrum Archive™ Enterprise Edition.
A state where a certain piece of data is only stored in a primary storage unit is called a “resident” state, a state where a certain piece of data is stored not only in the primary storage unit but also in a secondary storage unit is called a “pre-migrated” state, and a state where a certain piece of data is only stored in the secondary storage unit is called a “migrated” state. For example, all pieces of TS7700 data are first stored in the primary storage unit and thus placed in the resident state. After several minutes, the pieces of data are copied to the secondary storage unit and thus placed in the pre-migrated state. The pieces of data will be then be fully moved to the secondary storage unit when the system has only a very little disk space remaining; thus placing the pieces of data in the migrated state.
Storage products such as the IBM® TS7700 Virtualization Engine and IBM® Spectrum Archive™ Enterprise Edition adopt a magnetic tape as the secondary storage unit. When a certain piece of data is written to a magnetic tape, which is a sequential-access medium, and the same piece of data is subsequently updated, the piece of data that has been updated is appended to the end of the tape while the previous data is handled as an invalid area. When updates to the data frequently occur, the proportion of the invalid area increases, causing relative decrease in the capacity of the tape.
As a scheme for solving this problem, a technique called reclamation is known. Reclamation is a technique of only reading valid data from a tape that includes an invalid area and writing the valid data that has been read to another tape. Reclamation requires two tape drives, for the source tape from which the target data should be read and the destination tape to which the data that has been read should be written should be simultaneously accessed. In recent years, due to the increase in magnetic tape capacity, reclamation processing takes longer and the two tape drives are occupied longer, which is now recognized as a drawback of the technique. For example, the data transfer rate to a tape compatible with an IBM® TS1150 tape drive is up to 360 megabytes per second. When data is to be read from a tape medium of 10 terabytes using two TS1150 tape drives to carry out reclamation, then the two drives may be occupied for about eight hours (=10 (TB)/360 (MB/sec)).
The present invention solves the problem of prolonged occupation time of the two drives in the course of the reclamation processing.