An explosion of computer data and information requires an ever-increasing amount of computer readable storage space. Faster access to such storage space, particularly in light of ever-increasing data storage capacity, requires improved file positioning systems. In particular, such high-speed file positioning may be indicated in systems having components to backup and protect data sets, and to migrate less active data sets to secondary storage to increase primary storage space. A data set consists of any collection or grouping of data. In certain systems, a data set may include control information used by the system to manage the data. The terms data set and file are generally equivalent and sometimes are used interchangeably. Hierarchical storage management (HSM) programs manage storage devices, such as tape libraries, to control the flow of data between primary and secondary storage facilities.
In a hierarchical storage management system, data is stored in different types of storage devices depending upon the frequency of usage of the data. For instance, a system may include multiple storage media types to store data having different usage patterns and likelihood of access. More frequently used data may be stored on direct access storage devices (DASD) comprising high-performance rapid access storage devices, such as hard disk drives. Such readily accessible data is sometimes referred to as level zero volumes. Less frequently used data may be archived on slower and less expensive, demountable storage media, such as optical disks, magnetic tape cartridges, etc. Such archive volumes are referred to as level two storage.
Two common functions initiated by host systems in hierarchical storage management systems include migration and recall. Migration involves the movement of data from level 0 to level 2 storage to make more room for more frequently accessed data on the primary level 0 storage devices. If a host system attempts to access a data set that has been migrated to level 2 storage, then the recall function would be initiated to move the requested data sets from the level 2 storage to level 0.
International Business Machines Corporation (IBM®) provides the Data Facilities Storage Management Subsystem (DFSMS®) software which is included in the IBM MVS/ESA™ and OS/390® operating systems. This software allows host systems to perform hierarchical storage management operations, such as migration and recall. DFSMS and OS/390 are registered trademarks of IBM, and MVS/ESA is a trademark of IBM. The operation and implementation of the DFSMS system are described in IBM publications “DFSMS/MVS VIR3 General Information,” IBM document no. GC26-4900-04 (IBM Copyright, 1980, 1995) and “DFSMS/MVS VIR3 DFSMS/HSM Storage Administration Guide,” IBM document no. SH21-1076-02 (IBM Copyright 1984, 1995), which publications are incorporated herein by reference in their entirety.
The OPEN command establishes input/output communications at the application level between an application and a device. The CLOSE command terminates application-level input-output communications between an application and a device. At the Job Control Language (JCL) level, a host, a device, and a data medium are associated, without regard to any particular dataset. At the JCL level, a Data Definition (DD) request, which may be used to allocate a data set, returns the volume serial number (VOLSER) and logical file number of the current storage medium. At the application level, dynamic positioning is possible only by specifying the file number. Prior to the present invention, no means of creating a direct association between a dataset and a device has been provided via the OPEN and CLOSE commands (i.e., allocation and deallocation).
Certain applications used with tape storage devices, typically those that provide writes to a multiplicity of files (i.e., writing data sets to the same tape cartridge or group of cartridges, sometimes referred to as “file-stacking”) as a technique to exploit tape capacity and reduce slot requirements, do not have an application-owned control data set or catalog to manage data set locations (e.g., by storing file Block ID locations). Instead, they rely on the system catalog and/or the tape management system to maintain the file sequence number for a particular data set on a tape volume. Since these applications have not saved the Block ID location of files, only the sequence number of the file is available for processing (e.g., via the OPEN command) when the file is recalled from among the multiplicity of files. An n number of repeated Forward Space File commands must be issued to reach file sequence number n. With the capability to store hundreds and thousands of files on a single volume, access to data is severely impeded by this process, which can be quite time consuming. Other applications, e.g., HSM, only use a Block ID for file location, and not the sequence of the file being located or its file number.
Such prior art systems therefore typically have a severe performance penalty associated with the file-oriented positioning that is required to access a specific file on a multi-file tape volume, or to append to the end of such a volume. The performance penalty applies for systems and environments that either do not provide a device command that permits multi-file space operations, or do not provide system software support for the effective use of such a device command. For Enterprise System Connection (ESCON) and native Fibre Connection (FICON) attached devices (IBM.RTM. 5390 systems in particular), no device architectures exist which support such operations, nor does system software exist which supports such commands. Where hundreds or thousands of files exist on a tape volume and one must position to one of the latter files, the positioning itself, today accomplished one file or tape mark at a time, may take over two orders of magnitude more time to accomplish than the actual data transfer.
Further, some tape storage products, e.g., IBM.RTM. 3590 drives, cannot perform efficient file locations when used with certain (e.g., backup or tape server) applications, and therefore the service levels required by their users do not match their recall capability. Users of such products often wish to exploit the capability of such products to perform high-speed locate on a file sequence number. In a preferred implementation, indexing information is maintained in a region of the data storage media itself known as the volume control region (VCR). The VCR maintains the data and supports the interfaces required to provide this function. An interface is thus needed to support the drive capabilities by offering high-speed location to absolute or relative file positions.
Moreover, in addition to gaining fast access in opening files, such file positioning commands have an additional potential advantage: When the VCR region of a tape becomes corrupted, fast access to data by any means (Block ID or File Number) is not possible. Today, rebuilding the VCR requires reading the entire tape. A function that would issue a command to high-speed locate to the End of Tape marker could provide a far more efficient method to accomplish the VCR rebuild task.
Thus, a system and method for fast file positioning is needed, wherein the system architecture for fast tape file positioning encompasses microcode components, system software components, and application components.