The IBM MVS® mainframe operating system has evolved into the primary data server for very large enterprise computer system environments. This new and critical function has increased the availability of the MVS mainframe system, and all data stored within it, to essentially a 24×7 or “always available” level.
The ICF (Integrated Catalog Facility) catalog environment is a critical component of MVS, as virtually all data or “data sets” within the system must be cataloged within an ICF catalog; they cannot be located for access unless there is a successful search for the data set through the catalog. If the data is not cataloged, if the necessary catalog is not available, or if the catalog has a structural error that prohibits correct access to the data set's information, the application attempting to locate it cannot gain access. In even the smallest MVS systems, there are often hundreds of thousands of data sets, and in a large MVS system, the number of data sets is in the tens of millions. All of these data sets are catalogued, except for a very few special cases not important here.
An ICF catalog consists of two components, the Basic Catalog Structure (BCS), where the data set's name and disk storage volume location is stored, and the VSAM Volume Data Set (VVDS), where the physical and logical attributes of the data set are stored. Due to MVS system design, most MVS environments, regardless of their size, have a fairly small number of catalogs, typically between 10 and 100. The BCS portion of an ICF catalog can be physically stored on any disk volume, regardless of the location of the data that it catalogs. Each VVDS is physically stored on the volume on which the data resides that it defines, and therefore, the number of VVDSs is typically equal to the number of volumes that an MVS system has assigned to it. Each VVDS comprises a series of VVRs (VSAM Volume Records). The term “BCS” is generally used to describe the physical structure that contains the catalog records; the term “catalog” means the same as BCS, and is used to describe a BCS user catalog in general—for the purposes of this invention description, the two terms will be used interchangeably.
As a simple example, FIG. 1 shows three data storage volumes 110, 112, and 114. A BCS 120 resides on volume Vol. 001 (110), along with a VVDS 124 which, in turn, defines one or more data sets stored on that volume. Here, a record 202 (and one or more associated relationship records) in the BCS 120 points to one or more corresponding records 208 in VVDS 124, as indicated by the dashed line, and these records in the VVDS in turn point to the corresponding components of the VSAM data set 126 (“VSAM”) also on volume 110, as indicated by a dashed line as well. (In general, we will use dashed lines in such drawings to indicate references or pointers.) The BCS 120 also includes a second record 204 (and its associated relationship records) that point to a record 216 in a second VVDS 132 on Vol. 002 (112), which in turn defines a data set 134 on volume 112.
As is well known, an MVS system has two types of BCSs—one master catalog that identifies and locates the operating system data sets used by the MVS system, and one or more user catalogs that identify and locate all other data sets that are to be accessible to the system. (BCS 120 is a user catalog.) In order to be usable and accessible on a system, a user catalog must be “connected” to the system via a special record in the master catalog, called a UCAT Connector Record (not shown). Upon first use by the system following IPL (Initial Program Load), a user catalog is opened to the Catalog Address Space (CAS) catalog management program routines, and unless explicitly closed by an operator command, it remains open to the system for the life of the IPL. As further explained below, the important aspects of the present invention are methods of reorganizing and repairing the BCS while it remains open. When data storage devices are shared, that is, concurrently accessible and updateable by multiple operating systems, mechanisms exist to prevent unsynchronized sequences of events from occurring. When the serialization protocols are not adhered to, then the integrity of the physical data can be compromised.
Because an MVS system typically has few user catalogs, but a very large number of cataloged data sets, a single BCS will often have hundreds of thousands, possibly millions of data sets cataloged within it. These data sets are used by online data base systems that remain in use for weeks at a time, or even longer, and the data sets are not closed throughout that time. The same, or other, data sets are also used by “batch” job streams that are usually scheduled for execution on a daily, weekly, or monthly basis.
The catalog is considered “in use” as soon as it is opened by CAS, regardless of any open data sets cataloged within it. As mentioned, the catalog remains open for the life of the IPL, unless explicitly closed. As long as it is in use, repair or cleanup catalog management functions cannot be performed without risking the integrity of the catalog's physical structure.
MVS data set names are comprised of 1 to 44 characters. A period in a name, counted as a character, has special meaning. Periods are used to separate a name into nodes, called qualifiers. Qualifiers serve the purpose of grouping names for visual identification and masking capabilities. Also, the left most (high level) 1 to 4 qualifiers may be used to identify which catalog the “locate” pointers to a data set are to be recorded in. If a specific catalog is desired for a group of data sets, a special entry called an “alias” is created in the “master catalog,” causing all data sets whose high level qualifier(s) match the defined alias to be cataloged in the corresponding “user catalog.” Subsequent locates for an existing data set will begin by searching the master catalog for an alias match, which in turn directs the locate to the associated user catalog.
Historically on MVS systems, the data set name high-level qualifier derives from a short-form name of an application, such as PARTS for a manufacturing organization, and therefore, every data set in the PARTS application will have a name that begins with PARTS. The number of data sets in this application could number in the thousands, or tens of thousands.
Referring again to FIG. 1, to facilitate the search for a data set (in MVS terminology, this is called the “locate” operation), the data set name is used as a keyed search argument. The BCS 120 is physically a VSAM Key Sequenced Data Set (KSDS), and the key of its records is the data set name. When the catalog record for the data set is located, the volume cell(s) inside the record identify the disk storage (DASD) volume(s) on which the data set resides, for example 110 or 112, and points by relative address to the data set's descriptive record(s) within the VVDS 124, 132 (if the data set is VSAM or nonVSAM SMS managed) or the VTOC 128, 130 (if the data set is nonVSAM nonSMS managed). In the drawing, catalog record 202 in BCS 120 points to volume 001, and specifically to record 208 in the VVDS 124. If the VSAM data set is comprised of multiple components, there will be multiple VVDS records describing its components, and therefore the BCS record will have pointers to each VVDS record. Another data set record 204 points to volume 002 and specifically to record 216 in the VVDS 216, which in turn defines the data set 134. Volume 114 has associated VTOC 140, VVDS 142 and a representative KSDS 144.
FIG. 2 illustrates a BCS' internal structure in greater detail. Because the catalog is physically a VSAM KSDS itself, it is comprised of a data component 250 and an index component 270. The data component 250 conceptually comprises a series of columns 252, each column representing one Control Area or CA. The CAs are numbered from left to right beginning with CA.00. Each CA generally corresponds to a physical cylinder in DASD storage, and typically contains dozens or hundreds of Control Intervals or CIs. Each CI, for example 254, 256, can be thought of as a block of data storage records (a “block” is the unit of data transfer in an I/O operation). The catalog's data records are stored within each CI in ascending data set name key sequence, with each record physically adjacent to the record next to it. For deblocking purposes, the length of each record is identified in a positionally-relative control block field (called an RDF, or Record Descriptor Field) at the right end of the CI (not shown). Within the CI, and to the right of the last stored record in the CI, there may be “free space” 257 bytes, resulting from deletions of previous records contained in the CI, or pre-allocated by the user at time of definition and load.
As mentioned above, the BCS is a standard VSAM KSDS, so all records within it are maintained in logical ascending key sequence. Accordingly, when new data sets are added, the value of the data set name key determines the location within the BCS data component where the records will be inserted. Pre-allocated “free space” can be reserved in a BCS with the FREESPACE keyword on the IDCAMS DEFINE USERCATALOG command, and this space can be utilized to store records within the catalog when a new data set is cataloged. Additionally, free space can be reserved at the end of the catalog by allocating it larger than is required.
Pre-allocated free space can be reserved at either (or both) the CI (Control Interval) or CA (Control Area) level, specified as a percentage, and by standard VSAM design it will be evenly distributed across the entire file when it is initially loaded. This scheme is especially beneficial when record insertions are similarly distributed across the file. For example, in FIG. 2, CI level free space 257 in CA.00 and CA level free space 258 are illustrated. When an existing data set is deleted, its record in the BCS is physically removed, dynamically creating free space within the CI that held the record, and this free space can subsequently be utilized if a new data set of a similar name is cataloged at a later time.
When a record insertion is necessary, and sufficient free space within the CI is not available, VSAM automatically performs a “CI split,” moving half of the records in the affected CI to an available free CI within the same CA. If a free CI within that CA is not available, VSAM first performs a CA split, moving half of the CIs from the affected CA to an empty CA (always the first CA beyond the Hi-Used Relative Byte Address (called HURBA) at the catalog's end of file (EOF) location, thereby freeing up a CI that can be used to complete the CI split. The catalog's Hi-Allocated Relative Byte Address (called HARBA) represents the physical end of the allocated area of the catalog. If an empty CA cannot be found between the HURBA and HARBA address, VSAM allocates a new physical secondary extent of the BCS, thereby moving the HARBA address to the end of the new extent, and creating one or more empty CAs that can be used for CA splits. The extent limit count for a BCS is a maximum 123 (119 under certain conditions, but generally limited to 123 extents), and unlike other VSAM data sets, the BCS is restricted to a single volume allocation. If the 123 maximum extent limit is reached, Catalog Management fails the new data set allocation because it cannot successfully store the catalog record for the data set. Subsequent to this, other data sets might still be able to allocate, if the value of their data set name “points” to a location within the BCS where there is sufficient free space.
The index component 270 of the BCS is illustrated in simplified conceptual form in FIG. 2. In the illustration, the first level index or “sequence set” 272 comprises a series of records, e.g. 274, 276 etc., each of which logically corresponds to a respective CA in the data component 250. Index record 274 corresponds to CA.00, index record 276 corresponds to CA.01 and so on. The sequence set index record has an entry for each data CI in the corresponding CA, and each entry includes a compressed copy of the highest data set name key value within that CI, and the associated CI number within the CA (from which VSAM can calculate the actual Relative Byte Address (RBA) of the CI when it is required). Therefore, if each CA has 180 CIs per CA (a typical number for many BCSs), the sequence set for each CA will have 180 entries, each containing a high key and address value, or an indicator that the CI is free. A second level index (if needed) comprises another series of records, e.g. 282, 288, each of which correspond to a plurality of first level index records. In a second level index record, there is an entry for each first level index record associated with that record, including an indication of the highest key value in the corresponding first level index record. A third level index record 290 is shown, and there may be more, as needed, associated in this hierarchical fashion, sometimes called a B-tree index structure. At each index level, horizontal pointers such as 292 form a linked list or chain in key sequential order across that index level, used by VSAM for certain types of record access, as well as to verify index structure integrity.
Generally, the BCS is a relatively stable structure, in that the process of cataloging new data sets and deleting existing data sets balances out. After a few CI and CA splits are completed within a BCS, opening up free space in the volatile areas of the catalog, it generally settles down and doesn't grow very much. In such circumstances, many BCS can survive months or years without attention.
Many MVS systems, though, have one or more catalogs that do not fit this pattern. Some catalogs grow very rapidly in size due to a concentration of new data set allocations in one location within the BCS. When this occurs, close attention must be paid to the catalog to minimize the risk of system or application outage when the catalog “fills,” and no further data set. records can be inserted at the necessary location. The most frequent cause of this is a data set naming convention for an application that consists (typically) of a sequence number value within the data set name, resulting in each new added data set name inserted immediately after the previous ones within the BCS. Exacerbating this situation, in many applications such as this, if any data sets are deleted, chances are they are the oldest (lowest numbered) data set names. This results in addition of new records at the end of the current records for that group, and record deletions occurring at the front of that group, with the result that the catalog becomes “emptier and emptier” where the deletions are occurring, and “fuller and fuller” where the insertions are occurring. This is well known as the “creeping key” problem in VSAM KSDS files, and is endemic to application files as well as BCSs.
For any type of VSAM KSDS, whether it's an application file or a BCS, the solution to this problem is a file reorganization (generally called a “re-org”). A file reorganization begins with executing a utility program that reads the file (in this case, the BCS) in logical, record key sequence, to unload all records to a backup copy. The data set is then physically deleted and redefined (thereby making it empty), and the backed up records are then reloaded. The result is a “reorganization” of the data blocks (CIs and CAs) within the data set, and its loaded record area is now in proper proportion to the records contained within it. A re-org, for either an application data set or BCS, can be accomplished by commercially available IBM or other vendor utility programs.
In all cases, the known re-org process requires that all external points of access to the object data set or BCS be “quiesced”. In this context, the term “quiesced” means that all other software functions that can physically access the object data set must be temporarily inhibited from doing so, by issuing a “close” process, which formally and officially “disconnects” them from further access to the data set. In the case of the BCS, the accessing program is the MVS Catalog Management functions available from any active system that has physical access to it, as well as other utility programs that might be accessing the BCS itself as a data set (for repair or day-to-day management functions).
In many instances, the BCS resides on a shared DASD volume, thereby allowing Catalog Management on any number of MVS systems to be open to the BCS, updating it with new and deleted data sets, and with applications also open and accessing data sets that are cataloged within the BCS. This sharing across systems complicates the coordination and scheduling necessary to quiescing the BCS. For example, various levels of catalog sharing are known in the art, including: (1) Not Shared; (2) Shared only within a single Sysplex; (3) Shared across multiple Sysplexes; and (4) ECS—Enhanced Catalog Sharing within a Sysplex and utilizing the system's Coupling Facility. The present invention provides catalog integrity across these various levels of catalog sharing.
Aside from the re-org considerations, all KSDS structures (including application data sets, as well as a BCS) are prone to structural failures that typically are not life-threatening to the health of the BCS, but rather, are more often errors within localized aspects that only affect certain types of attempted operations. A frequent structural failure is in the index component (due to its internal complexity), where index key values become corrupted, index record pointers are stored incorrectly, entire index records are missing, index records do not correctly reflect data that is physically stored in the associated data component. These defects are illustrated in FIG. 6. Another frequent structural failure is in the data component, where duplicate keyed records are somehow stored, or records are stored out of key sequence. More serious structural failures also occur in a KSDS structure, but on a less frequent basis, necessitating a recovery from a prior backup.
Oftentimes the repair of an existing BCS, rather than restore or forward recovery from an earlier backup, is the preferred course of action. Many of the error situations (as mentioned above) within a BCS lend themselves to this easier and quicker process, with less downtime to the catalog. If a previous backup is being used for forward recovery and it is relatively old, it might require the processing of very large amounts of SMF data to bring the catalog “forward” from the time of the backup to the current time. In the worst situation, such as a user error, there might not even be a usable backup copy of the catalog, and repair of the existing catalog is the only solution. When performing a “repair” using the existing physical BCS structure, the conventional methodology is to perform a backup of the catalog's records, followed immediately by a deletion and new definition (allocation) of the BCS, and reloading of the catalog's records. This is, in fact, similar to a re-org of the catalog, but requires certain changes in methodology (as explained in detail below) from a basic re-org.
Prior Art BCS Re-org Methodologies
Methodologies for performing a BCS re-org exist. Prior art includes IBM IDCAMS EXPORT/IMPORT, EMC Catalog Solution DUMP/REBUILD, and the Catalog RecoveryPlus™ software product BACKUP/RESTORE facility, commercially available from Mainstar Software Corporation, Bellevue, Wash. (assignee of the present invention). These known methodologies do not satisfy the requirements for a “re-org while open,” as they inherently require the BCS to be quiesced and closed throughout the re-org process. They also require the BCS to be physically deleted and re-defined between the backup and restore processes. If an attempt is made to utilize one of these methodologies for re-org while the BCS is open and active, serious damage will almost certainly occur to the internal structure of the BCS, and any of the jobs (including Catalog Management) may ABEND (abort or end abnormally) with unpredictable consequences later.
In a prior art BCS re-org process, the user must schedule a quiesce period during which time the BCS will be inaccessible. To ensure that the BCS is not updated between the time of the backup and restore, the MVS operator MODIFY command, or the like, is issued on all systems sharing access to the catalog, forcing it to close and un-allocate to CAS. At this point, user knowledge of, and strict adherence to procedures, across all systems, must be maintained, as the BCS can automatically re-open if any job is executed that requests access to it.
When the user is satisfied that the BCS is quiesced, a system backup utility program is executed, to write a copy of the catalog's data records to a physical sequential file. If EXPORT is used, the records are retrieved in ascending key sequence, as standard VSAM sequential read-access through the BCSs index is utilized. This methodology preserves the catalog's record sequence for restoring into the newly-defined and empty BCS. If any of the non-IBM system backup utilities are used, the records are retrieved in physical record sequence, bypassing the BCSs index structure. This latter methodology protects against existing index structure damage that the BCS might have (see FIG. 6), ensuring that all records from the BCS data component are retrieved, but it requires that the records be sorted prior to restoring them into the newly-defined and empty BCS.
When the backup is complete, the existing BCS is deleted, re-defined as a new, empty BCS, and the records from the backup file are then restored with the appropriate utility function with regard to the backup. In some of the prior art methodologies, the delete/define function is automatically performed for the user, while in others the user must do it manually. Also, some prior art methodologies allow certain physical attributes of the BCS to be changed on the new allocation from the existing BCSs attributes. Regardless of the methodology, when the restore operation is complete, system and application processing that might use the catalog can be restarted, and the catalog will re-open as necessary.
Prior Art BCS Repair Methodologies
Methodologies for performing a BCS repair also exist. Prior art includes EMC Catalog Solution DUMP/REBUILD, and the Catalog RecoveryPlus software product with its BACKUP/RESTORE facility mentioned above. The IBM IDCAMS EXPORT/IMPORT facility is not considered pertinent to BCS repair, as its read logic uses the catalog index structure to perform a logical record access, and index structural errors, as well as duplicate or out-of-sequence record keys cause it to prematurely terminate.
The current methodologies from EMC and Mainstar do not satisfy the requirements for a re-org or repair while open, as they inherently require the BCS to be quiesced and closed throughout the repair process. They also require the BCS to be physically deleted and re-defined between the backup and restore processes. If an attempt is made to utilize one of these methodologies for repair while the BCS is open and active, serious damage will almost certainly occur to the internal structure of the BCS, and any of the jobs (including Catalog Management) may ABEND (abort or end abnormally) with unpredictable results at a subsequent time.
For the BCS, it is very difficult to schedule and perform a re-org or repair operation. In some repair situations, a structural error might affect processing only when executing certain functions, or there might be workarounds that aren't affected by the error. Most BCS have 24×7 availability requirements, from at least one of the MVS systems that are sharing access to it, and the “down-time” to re-org or repair the BCS is disruptive to production application processing. “Down-time” is defined as the time between closing and re-opening of the catalog, enabling application jobs and online systems to once again resume access to the data sets cataloged within the BCS. During down-time, access to the BCS must be stopped, including allocation of new data sets, deletion of existing data sets, and application access to data within existing data sets is denied. Even if the downtime is planned and scheduled, it represents an outage that might not be acceptable for 24×7 environments. If it is unplanned and a forced situation, it can result in disastrous business disruption. For this reason, many BCSs are re-orged very infrequently, and repairs for structural failures might be delayed indefinitely. What is needed is a way to re-org or repair a BCS while keeping it open so that catalogued data sets remain continuously available to applications for processing.