1. Field of the Invention
The present invention relates to a data storage hierarchy with a shared storage level. More particularly, the invention relates to a data storage hierarchy in which a single level one storage space of a file space, or a directory therein, is shared across an entire network.
2. Discussion of the Related Art
Modern data processing systems include a host processor having one or more central processing units, a memory facility, an input/output system, and an interconnection system (i.e. a bus). The processor manipulates data stored in the memory according to instructions provided to it. The memory must therefore be capable of storing data required by the processor and transferring that data to the processor at a rate capable of making the overall operation of the computer feasible. The cost and performance of computer memory is thus critical to the commercial success of a computer system.
As computers manipulate ever increasing amounts of data they require larger quantities of data storage capacity. Computer memory is available in several forms. Generally, the faster data can be written to or read from a particular form of memory, the more expensive it is. Microchips are fast, but expensive, and are typically used as the primary or main memory in/to the host processor. Other available forms of memory are used as auxiliary or peripheral memory, and include numerous peripheral storage devices. For example, magnetic direct access storage devices (DASD), magnetic tape storage devices, and optical recording devices are all peripheral storage devices. These devices have a greater storage capacity and lower cost than main memory, but do not provide the same performance. For example, the time required to properly position a tape or disk beneath the read/write mechanism of a drive cannot compare with the rapid, purely electronic data transfer rate of main memory. It is, however, inefficient to store all of the data in a system in a single type of memory device. Simply storing all data in main memory is too costly and simply storing all data in a peripheral storage device significantly reduces performance. A physical portion of the total storage area of one or more peripheral storage devices is referred to as a "storage space".
A typical data processing system includes both main memory and one or more peripheral storage devices. A data processing system having a plurality of peripheral storage devices arranged hierarchically is referred to as a "data storage hierarchy". In a data storage hierarchy, primary or level 0 data storage generally refers to the level therein having the highest performance and lowest storage capacity. Secondary or level 1 (or lower level) storage includes the (equal or) greater storage capacity, but at (equal or) reduced performance and thus reduced cost. The unit of data storage can be data sets, files, or objects. Data set and file are terms used essentially interchangeably in different operating system environments to mean a collection of data in a prescribed arrangement and described by control information to which the system has access. An object is a variably sized byte stream with no record or other internal boundary orientation. For convenience, the term "file" is used hereinafter to refer generically to data sets, files, objects or any such data entities. Data is moved and copied between different levels of the hierarchy as files (or some larger unit of data), as required, to balance performance, storage and cost. Such data transfers and related actions to manipulate a data storage hierarchy (such as the deletion of data which is no longer being used from the hierarchy) to achieve this balancing is known as "storage management".
Storage management includes several subcomponents, such as performance management, reliability management, capacity management, space management and availability management. Each of these may involve the transfer of data between levels of the hierarchy. Space management is the movement of data between different levels of the hierarchy so as to store data only in the most appropriate level of the peripheral storage hierarchy. For example, relatively active data should be stored in a relatively high performing level of the hierarchy and relatively inactive data should be stored in a relatively lower performing level of the hierarchy. As data ages, it is generally referenced less (i.e. relatively less active) and should thus be moved to lower levels of the data storage hierarchy. The movement of data from one level of a data storage hierarchy to another is referred to as "migration", and may include data compaction to save storage space.
Availability management is the backup of data within a data storage hierarchy to improve the likelihood of its being available if and when it is needed by the host processor. The original or primary copy of the data is not deleted; an additional or secondary copy is generated and transferred to another portion of the data storage hierarchy. The secondary copy is typically stored on a different peripheral storage device from the primary copy to ensure the availability of the data. If the primary copy of the data becomes unavailable, such as by device failure, the secondary copy of the data may still be referenced. The secondary copy of the data need not be stored in a different level of the data storage hierarchy, but such may be desirable as the secondary copy is not likely to be as active as the primary copy. Data backup may occur unconditionally or incrementally. Unconditional backup generates a copy of any specified file, incremental backup copies only those files which have been updated since the previous secondary copy was generated. Note that transferring a file by migration may include the maintenance of a primary copy of a file in level 0 storage. The primary copy is, however, an empty file--the data in the file having been transferred to the secondary copy of the file in level 1 storage.
Storage management is traditionally performed manually. The data owner decides when to migrate or backup data, and where such migrated and backup files should be stored. Such decisions are time consuming, usually requiring a review of each file stored. The operations involved are often so time intensive that manual reviews and decisions are not made until there is no alternative. For example, a user might not migrate any files to level 1 storage until all storage space in level 0 storage is in use. In large systems, or in any system storing relatively large amounts of data, it is simply impractical to perform storage management manually.
In recent years, computer software has become available which reduces the need for manual operations. The IBM Data Facility Hierarchical Storage Manager (DFHSM) application program is an example of such software. DFHSM is a utility to the IBM Multiple Virtual Storage (MVS) series of operating systems. DFHSM uses specified management criteria to manage files, including the automatic recall of a transferred file upon an attempt to reference the file by a user. The management criteria include the minimum length of time a file will be permitted to be resident in the data storage hierarchy or a particular level thereof before it will be eligible to be migrated or deleted, or the maximum length of time a file will be permitted to exist without being backed up after being updated. Numerous management criteria are defined to the system and stored in a configuration file. The DFHSM management criteria are selected manually at the time of file identification to DFHSM. While DFHSM improves storage management, the manual selection of management criteria is still burdening.
In recent years, manual operations for storage management have been further reduced. System-managed storage is a term indicating that the system itself selects the management criteria for data and performs storage management. The storage administrator need only define the management criteria in a configuration file, the system itself selects the management criteria for a particular file upon its creation and manages it accordingly. An example of software providing system-managed storage is IBM Data Facility Storage Management Subsystem software, hereinafter referred to simply as DFSMS (DFSMS is a trademark of IBM Corporation). DFSMS is a subsystem of the IBM Multiple Virtual Storage (MVS) series of operating systems. DFSMS encompasses several subcomponents, including DFHSM and IBM Multiple Virtual Storage/Data Facility Product software, hereinafter referred to simply as MVS/DFP (MVS/DFP is a trademark of IBM Corporation).
DFSMS accomplishes the aforementioned with the addition of the automatic class selection (ACS) routine to MVS/DFP. The management criteria are defined in sets of such criteria (known by names such as storage and management classes, but hereinafter referred to simply as "management classes") in a configuration file and the ACS routine itself is defined once by the storage administrator for the system. As each file is identified to DFSMS, MVS/DFP uses the ACS routine to automatically select the management class therefor. ACS selects the management class based upon certain characteristics of a file, such as the name of the file, the owner of the file, the directory path to the file, and the size of the file. Once the ACS routine has selected the management class for a file, such management class is stored in one or more fields in a catalog existing in the host processor.
DFSMS must also provide for storage of the current data management attributes associated with each file. The management attributes are stored in one or more fields in the catalog, and/or the volume table of contents for each volume of data. The management attributes include certain data relating to the status and use of each file and are updated as such data changes during use. For example, the management attributes include the date and time a file became resident in the data storage hierarchy or a particular level thereof, the date and time a file was last accessed (whether the file was updated or not), and the date and time of last backup and update of the file.
One or more common DASD include one or more control files for storing control information also needed for storage management. These control files are not themselves managed by DFSMS, but may co-exist on common DASD with managed data files. A separate control file exists for each type of storage management activity (i.e. one control file for migration, one control file for backup, etc.). For example, a control file includes the information necessary to map migrated and secondary copies of files to their catalog and table of contents entries, and/or their primary copies (i.e. to their level 0 source files). Such mapping allows for recall of the correct migrated or secondary copy of a file upon specification of such file by a user. After determining that the primary copy of a file has been migrated, or that the secondary copy is required, the mapping data is used to locate and access the migrated or secondary copy of the file.
Actual storage management by DFSMS occurs during periods of relative system inactivity to minimize interference with other processing. During the prescribed periods, DFHSM is called to compare the management attributes of files to the management criteria of such files, as defined by the assigned management class. DFHSM then manages the files accordingly, transferring files and updating the management attributes and control information as required. In addition, storage management may be applied to units of data larger than a file. For example, a group of files having certain common management requirements may be established such that the entire group is managed as a whole. The group is assigned a management class of its own and managed accordingly. It should be understood that where files are hereinafter described as being managed during storage management, entire groups or other data units could be similarly managed, unless otherwise specified.
Another example of computer software to provide system-managed storage is IBM Data Facility Storage Management Subsystem for Virtual Machines software (hereinafter referred to simply as DFSMS/VM), which is a subcomponent of the IBM Virtual Machine (VM) series of operating systems. The currently available release 1 of DFSMS/VM does not provide space or availability management, but does provide several improvements to the VM operating systems. Traditionally, VM operating systems have used a minidisk file system (MFS) for organizing data for users on peripheral storage devices. MFS preallocates contiguous storage space (a consecutively addressed logical portion of a physical storage space) within the data storage hierarchy to individual users. Each preallocated contiguous storage space is known as a "minidisk" because it appears to be a complete peripheral storage device (such as a DASD) to the user. A minidisk is a set of consecutively addressed DASD cylinders. Each minidisk is owned by (i.e. assigned to) a particular user, no other users can store or access files thereon unless so authorized by the owning user. A minidisk (and MFS) is said to be "preallocated" because the storage space reserved to a user is typically larger than that immediately needed by that user. As a user fills a minidisk with files, there is a tendency to try to keep at least a portion of that minidisk available for new files by manually deleting the existing, unused files. The condition of "underutilization" refers to the remaining increments of unused storage space spread throughout the data storage hierarchy and results in a significant amount of wasted storage space. In addition, "fragmentation" results when a minidisk is deallocated from a particular user and cannot be reallocated to another user because it is too small to meet the needs of such other user.
A relatively new file system, known as the "shared" file system (SFS), is included in current releases of VM operating systems, such as Virtual Machine/System Product (VM/SP) release 6. Shared file systems significantly reduce underutilization and fragmentation problems as compared to MFS. However, because of the large number of existing VM operating systems users, the use of MFS has not been discontinued. Thus, SFS and MFS may exist simultaneously in a data storage hierarchy. SFS allows for the dynamic sharing of peripheral storage space among different users. Physical storage space is not preallocated to a user account. Instead, as each user actually stores a file, storage space is dynamically allocated for that file only. Users are given accounts to a file pool, which is simply a collection of files for a set of users. In VM operating systems, a file pool is a collection of minidisks owned by a single virtual machine that contain files for a number of users. Each user stores files in a logical file space within a SFS storage group, which is a collection of minidisks within a file pool. The storage space assigned to a file space changes dynamically as files are added thereto, deleted therefrom, or updated. The files in each file space can be organized into one or more directories (and subdirectories within each directory), the file space itself being a top level of the directory hierarchy.
SFS includes control information in level 0 storage as part of every file pool. The control information includes files of information used to locate minidisks in the respective file pool and to track which blocks of such storage space are in use. The control information also includes a catalog of information about the directories and files in the file pool, such as the owner of each file. Multiple file pools can exist for each instance of DFSMS/VM, each file pool being an instance of SFS.
Two heretofore unrecognized problems exist which are undesirable in the system-managed storage environment of DFSMS/VM. First, each instance of DFSMS and existing releases of DFSMS/VM (and all other known storage management software) require their own segregated level 1 storage. Currently, SFS files are given a unique, internal identifier to distinguish between files within the SFS catalog. Because there is no simple, unique way to identify files across (among separate instances of) file systems, file pools, operating systems, etc. in a network, separate (physically segregated) storage spaces of files must be maintained in each data storage hierarchy level for each such subdivision of a network to ensure that the files therein can be properly distinguished. The overhead and associated loss of efficiency related to maintaining segregated storage space is undesirable. Second, the use of common DASD control files to map the files in the separate level 1 storage spaces to their level 0 source files is undesirable as such is complex, inefficient and error prone.