Applicants"" invention relates, generally, to a data storage management system and method to permit a computer system to read and write data in alternative information storage architectures using a data storage device having a fixed device architecture. Applicants"" invention relates, more specifically, to a data storage management system and method to permit fast file-oriented positioning and appending.
An explosion of computer data and information requires an ever-increasing amount of computer readable storage space. Faster access to such storage space, particularly in light of ever-increasing data storage capacity, requires improved file positioning systems. In particular, such high-speed file positioning may be indicated in systems having components to backup and protect data sets, and to migrate less active data sets to secondary storage to increase primary storage space. A data set consists of any collection or grouping of data. In certain systems, a data set may include control information used by the system to manage the data. The terms data set and file are generally equivalent and sometimes are used interchangeably. Hierarchical storage management (HSM) programs manage storage devices, such as tape libraries, to control the flow of data between primary and secondary storage facilities.
In a hierarchical storage management system, data is stored in different types of storage devices depending upon the frequency of usage of the data. For instance, a system may include multiple storage media types to store data having different usage patterns and likelihood of access. More frequently used data may be stored on direct access storage devices (DASD) comprising high-performance rapid access storage devices, such as hard disk drives. Such readily accessible data is sometimes referred to as level zero volumes. Less frequently used data may be archived on slower and less expensive, demountable storage media, such as optical disks, magnetic tape cartridges, etc. Such archive volumes are referred to as level two storage.
Two common functions initiated by host systems in hierarchical storage management systems include migration and recall. Migration involves the movement of data from level 0 to level 2 storage to make more room for more frequently accessed data on the primary level 0 storage devices. If a host system attempts to access a data set that has been migrated to level 2 storage, then the recall function would be initiated to move the requested data sets from the level 2 storage to level 0.
International Business Machines Corporation (IBM(copyright)) provides the Data Facilities Storage Management Subsystem (DFSMS(copyright)) software which is included in the IBM MVS/ESA(trademark) and OS/390(copyright) operating systems. This software allows host systems to perform hierarchical storage management operations, such as migration and recall. DFSMS and OS/390 are registered trademarks of IBM, and MVS/ESA is a trademark of IBM. The operation and implementation of the DFSMS system are described in IBM publications xe2x80x9cDFSMS/MVS VIR3 General Information,xe2x80x9d IBM document no. GC26-4900-04 (IBM Copyright, 1980, 1995) and xe2x80x9cDFSMS/MVS VIR3 DFSMS/HSM Storage Administration Guide,xe2x80x9d IBM document no. SH21-1076-02 (IBM Copyright 1984, 1995), which publications are incorporated herein by reference in their entirety.
The OPEN command establishes input/output communications at the application level between an application and a device. The CLOSE command terminates application-level input-output communications between an application and a device. At the Job Control Language (JCL) level, a host, a device, and a data medium are associated, without regard to any particular dataset. At the JCL level, a Data Definition (DD) request, which may be used to allocate a data set, returns the volume serial number (VOLSER) and logical file number of the current storage medium. At the application level, dynamic positioning is possible only by specifying the file number. Prior to the present invention, no means of creating a direct association between a dataset and a device has been provided via the OPEN and CLOSE commands (i.e., allocation and deallocation).
Certain applications used with tape storage devices, typically those that provide writes to a multiplicity of files (i.e., writing data sets to the same tape cartridge or group of cartridges, sometimes referred to as xe2x80x9cfile-stackingxe2x80x9d) as a technique to exploit tape capacity and reduce slot requirements, do not have an application-owned control data set or catalog to manage data set locations (e.g., by storing file Block ID locations). Instead, they rely on the system catalog and/or the tape management system to maintain the file sequence number for a particular data set on a tape volume. Since these applications have not saved the Block ID location of files, only the sequence number of the file is available for processing (e.g., via the OPEN command) when the file is recalled from among the multiplicity of files. An n number of repeated Forward Space File commands must be issued to reach file sequence number n. With the capability to store hundreds and thousands of files on a single volume, access to data is severely impeded by this process, which can be quite time consuming. Other applications, e.g., HSM, only use a Block ID for file location, and not the sequence of the file being located or its file number.
Such prior art systems therefore typically have a severe performance penalty associated with the file-oriented positioning that is required to access a specific file on a multi-file tape volume, or to append to the end of such a volume. The performance penalty applies for systems and environments that either do not provide a device command that permits multi-file space operations, or do not provide system software support for the effective use of such a device command. For Enterprise System Connection (ESCON) and native Fibre Connection (FICON) attached devices (IBM(copyright) 5390 systems in particular), no device architectures exist which support such operations, nor does system software exist which supports such commands. Where hundreds or thousands of files exist on a tape volume and one must position to one of the latter files, the positioning itself, today accomplished one file or tape mark at a time, may take over two orders of magnitude more time to accomplish than the actual data transfer.
Further, some tape storage products, e.g., IBM(copyright) 3590 drives, cannot perform efficient file locations when used with certain (e.g., backup or tape server) applications, and therefore the service levels required by their users do not match their recall capability. Users of such products often wish to exploit the capability of such products to perform high-speed locate on a file sequence number. In a preferred implementation, indexing information is maintained in a region of the data storage media itself known as the volume control region (VCR). The VCR maintains the data and supports the interfaces required to provide this function. An interface is thus needed to support the drive capabilities by offering high-speed location to absolute or relative file positions.
Moreover, in addition to gaining fast access in opening files, such file positioning commands have an additional potential advantage: When the VCR region of a tape becomes corrupted, fast access to data by any means (Block ID or File Number) is not possible. Today, rebuilding the VCR requires reading the entire tape. A function that would issue a command to high-speed locate to the End of Tape marker could provide a far more efficient method to accomplish the VCR rebuild task.
Thus, a system and method for fast file positioning is needed, wherein the system architecture for fast tape file positioning encompasses microcode components, system software components, and application components.
Applicants"" invention provides fast read performance for accessing a file from among a multiplicity of files when its corresponding Block ID is unavailable, by furnishing new file-positioning commands (e.g., as subsets of the OPEN command), allowing access to data based on File Number, at the same fast locate speed as with Block IDs. This capability thus expands the range of applications that can use certain tape drive systems (e.g., Magstar 3590 tape drives) effectively.
The present invention includes both a device architecture for supporting the file oriented positioning and the appending, as well as a software system that can permit easy application access to the device facilities and system components which permit the advantages to accrue without any software application or operator changes. It is contemplated that the high-speed file location occurs at the highest speed supported by the device (head indexing in milliseconds, followed by operations at fast forward or rewind speed). Command and software architecture may also be provided for sensing current file position and maximum recorded files on the volume. In certain embodiments, the software architecture is fully extended to support multi-volume aggregates transparently to the application and with full integrity. Additional device architecture may support location to sequential tape marks, and to absolute locations on tape specified in units idiosyncratic to the device, the latter command architecture being likely to have utility for recovery programs.
The present invention comprises several device command innovations. One is a relative positioning command, which permits positioning (relative to current position) based on blocks, tape marks, sequential tape marks, or end of data marks. Another is an absolute positioning command, which permits positioning (independent of current position) to a given block, tape mark, sequential tape mark, or device-specific position (such as a wrap counter and tachometer counter). Coupled with these commands may be other commands for sensing current position, for reporting the nature of a given tape structure encountered, for reporting the maximal number of files or blocks on a tape (or logging these facts), as well as a full error reporting architecture for supporting their use, e.g., in a 24-by-7 high availability system.
The system software works synergistically with the device to provide the overall groundwork for the application performance advantages. The software integrates the support of the positioning commands in a number of forms. In particular, it may be integrated into the pre-existing macros to set and determine position (known as NOTE and POINT in an exemplary implementation), both in relative and absolute forms. For positioning at the point of opening a new file or an existing one, the software provides direct integration in the macros pertain to file opening (e.g., OPEN in an exemplary implementation). Finally, the device capability is surfaced all the way to the user level via Job Control Language (JCL) support in the File Number. Again, this is accomplished with no external changes to the application programmer or operator.
One of the most significant innovations at the software level is the integration of this support to fully accommodate multi-volume aggregates. In this case, one does not know which volume in the aggregate is the actual repository of the desired information (or which contains the desired file specified). The system software takes advantage of any known and provided information to maximize performance. In the worst case, the first volume is mounted and the last file determined by the supported command, which provides this at load time. The software validates this information by reading the trailer labels of the final file (normally near the physical beginning of tape for serpentine tape architectures). If the desired file is not on the currently mounted tape, the next tape in the sequence of the aggregate is mounted and the same process repeated with file numbering information appropriately adjusted. When the correct volume is mounted, then the proper calculations are done to accommodate the arithmetic for the file label format currently active, and the positioning is done at high speed.
It is noted that the highest speed operation of this process will diminish if the native indexing capabilities of the device have been compromised due to a failure of some kind that blocked the indexing. In that case, the software still operates transparently, although the performance advantage is lost. In an exemplary implementation, the software system also has integrated into the tape initialization subsystem the capacity to recognize a tape with an impaired positioning index and the ability to rebuild that index using one of the new device commands which forces read speed positioning to final position of the most recent recording on the media (and, consequently, causes the device to rebuild its index).