A virtual tape system is a tape management system such as a special storage device or group of devices and software which manages data such that the data appears to be stored entirely on tape cartridges when portions of the data may actually be located in faster, hard disk storage. Programming for a virtual tape system is sometimes referred to as virtual tape server (VTS), although these terms may be used interchangeably, unless otherwise specifically indicated. A virtual tape system may be used with hierarchical storage management (HSM) system in which data is moved as the data falls through various usage thresholds to slower but less costly forms of storage media. A virtual tape system may also be used as part of a storage area network (SAN) where less-frequently used or archived data can be managed by a single virtual tape server for a number of networked computers.
In prior art virtual tape storage systems, such as International Business Machines (IBM) Magstar Virtual Tape Server, at least one virtual tape server (VTS) is coupled to a tape library comprising numerous tape drives and tape cartridges. The VTS is also coupled to a direct access storage device (DASD), comprised of numerous interconnected hard disk drives.
The DASD functions as a tape volume cache (TVC) of the VTS subsystem. When using a VTS, the host application writes tape data to virtual drives. The volumes written by the host system are physically stored in the tape volume cache (e.g., a RAID disk buffer) and are called virtual volumes. The storage management software within the VTS copies the virtual volumes in the TVC to the physical cartridges owned by the VTS subsystem. Once a virtual volume is copied or migrated from the TVC to tape, the virtual volume is then called a logical volume. As virtual volumes are copied from the TVC to a Magstar cartridge (tape), they are copied on the cartridge end to end, taking up only the space written by the host application. This arrangement maximizes utilization of a cartridge storage capacity.
The storage management software manages the location of the logical volumes on the physical cartridges, and the customer has no control over the location of the data. When a logical volume is copied from a physical cartridge to the TVC, the process is called recall and the volume becomes a virtual volume again. The host cannot distinguish between physical and virtual volumes, or physical and virtual drives. Thus, the host treats the virtual volumes and virtual drives as actual cartridges and drives and all host interaction with tape data in a VTS subsystem is through virtual volumes and virtual tape drives.
One issue of VTS systems is the management of data within the tapes. The VTS system may have a number of duplicate, invalid, latent or unused copies of data. After a virtual tape volume is created and/or modified (one or more records are written to the volume) and closed, the virtual tape volume is copied onto the physical tape (logical) volume. The image of the virtual volume copied to a physical volume when the virtual volume was closed is a complete version of the virtual volume at the point in time the virtual volume was closed. If a virtual volume is subsequently opened and modified, when the virtual volume is closed, that image of the virtual volume is also copied onto physical tape, however the virtual volume does not overwrite the prior version of the volume since the virtual volume may have a different size than the previous version. So at any point in time, there may be several versions of the same volume serial number that reside on one or more physical tape volumes.
Moreover, physical volumes within a VTS are arranged in groups that are called “pools,” with each physical volume including one or more logical volumes. Each of the physical volumes managed by the VTS system is assigned to one of 32 pools, for example. It is understood that each pool of physical volumes is assigned a name and may have one or more parameters associated therewith. For example, typical parameters associated with a pool include, blit are not limited to: a media type (e.g. physical volumes having 10 Gbyte tape or 20 Gbyte tape); and a rule(s) for managing volumes in a pool. One rule may involve the concept of “reclamation” whereby the VTS monitors what percentage of data associated in a particular physical volume is still valid. That is, over time, data space occupied by a logical volume needs to be reclaimed from a physical volume when the data is no longer used or needed by the host, i.e., has expired. Thus, if any volume(s) in the pool falls below a reclaim percent threshold, then a reclamation process will be performed to take the valid logical volume(s) off the physical volume and put the valid logical volume on another physical volume—potentially combining multiple partially full physical volumes and tilling up the other.
If a virtual volume is removed from the physical volume and put on to another physical volume, the data on the first physical volume is deleted but has not been overwritten, and thus, the data may be recovered. Further, data associated with the most current version of a virtual volume may be expired or considered latent or unusable by the customer, but the virtual volume still will exist on the physical tape volume and could be accessed.
Hierarchical storage, with active files on a first tier of media (such as hard disk, optical disk, nonvolatile memory, etc.) and archived files on a second tier of media (such as magnetic tape, digital tape, hard disk, etc.) is popular with users for its cost savings, energy savings, etc. A common scheme is to use hard disk media for the first tier and magnetic tape media for the second tier. However, traditional HSM systems suffer from several drawbacks which limit their adoption.
A standard approach for HSM is to migrate files to the second tier and leave a stub file on the first tier. An access to the stub file causes the original file to be recalled from the second tier to the first tier. This approach is user friendly, as users are isolated from the details and complexity of the second tier storage. The stub file appears as a normal file on the first tier and supports some or all standard file operations. This implementation works for small scale solutions, but also has several drawbacks. These drawbacks include, but are not limited to: (1) each migrated file is represented by a stub file, which results in managing millions or billions of stub files, which is not practical from a space and time of access perspective; (2) as individual files are migrated, bottlenecks may develop from the number of transactions required to move or track the individual files; (3) centralized storage of the indexes which manage the stubs and original files may limit the size of the solution, since a central repository including a billion items may not be practical to implement; (4) users or the user's applications may not be aware that the files are on the second tier and may attempt to invoke access patterns which result in unexpected and/or unacceptable response times; and (5) the central repository represents a single point of failure that may cause the entire implementation to fail.
In an alternative approach, users and/or user's applications may move files to and from the second tier via dedicated interfaces. This implementation minimizes the risk of inadvertent second tier overloads, but also has drawbacks, including: (1) it puts a burden on users to conform to applications and/or application programming interfaces (APIs) specific to the dedicated interface, and for users who prefer a simple copy-based interface, this can preclude the use of a dedicated interface and a second tier as a solution; (2) files are still handled and tracked individually, though their data can be aggregated; (3) a central repository is still used, and the size of the central repository may still limit scalability; and (4) there is still a central point of failure (e.g., the central repository).
Therefore, a storage solution which mitigates or eliminates the drawbacks and problems associated with conventional implementations while providing a tiered storage solution which results in cost and energy savings would be beneficial.