1. Field of the Invention
The present invention relates to computer software, and more particularly to analyzing the physical characteristics of database records in IMS databases.
2. Description of the Related Art
The IMS database (IMS DB) was created in 1970 by International Business Machines Corporation (IBM) and is one of the two major parts to IBM""s IMS/ESA (Information Management System/EnterpriseSystems Architecture). The second part is a data communications system (IMS Transaction Manager or IMS TM). Together, the transaction manager and the database manager create a complete online transaction processing environment providing continuous availability and data integrity. IMS/ESA runs under the MVS/ESA or OS/390 operating systems, which run on the S/390 platform.
At the heart of IMS DB are its databases and its data manipulation language, Data Language/I (DL/I). The IMS database is a hierarchical (non-relational) database. IMS databases are hierarchic collections of data, information organized in a pyramid fashion with data at each level of the hierarchy related to, and in some way dependent upon, data at the higher level of the hierarchy. DL/I calls allows a user to create and access these IMS databases.
An IMS database may include one or more data set groups. Each data set group may include one or more segments. A segment is the smallest piece of data DL/I can store. Each segment may be qualified by its hierarchical relationship to other segments in a database record. Each database record has one root segment and zero or more child segments. A xe2x80x9croot segmentxe2x80x9d is at the top of the hierarchy, and there may be only one root segment in a database record. All other segments (other than the one root segment) in a database record are referred to as xe2x80x9cdependent segmentsxe2x80x9d, and their existence depends on there being a root segment. A xe2x80x9cparent segmentxe2x80x9d is any segment that is defined in the database descriptor (DBD) as capable of having a dependent segment beneath it in the hierarchy. A xe2x80x9cchild segmentxe2x80x9d is any segment that is a dependent of another segment above it in the hierarchy.
Segments may be of various segment types. Those segments which share similar qualities are of the same type. For example, if the root segment of a database record represents a course, and that root segment has three child segments labeled: instructor, student, and location, those child segments may be referred to as segment types.
The root segment is referred to as a first level of the IMS database, direct children of the root segment are referred to as a second level of the IMS database. As used herein, a second level of the IMS database may alternatively be referred to as a first level child segment, as child segments may only appear starting with the second level of the IMS database. Similarly, children of the children of the root segment (i.e., grandchildren of the root segment) are referred to as a third level of the IMS database, or alternatively, second level child segments. The level of each subsequent generation of children may be determined by incremented the previous level by one (e.g., a fourth level of the IMS database is equivalent to a third level child segment).
When IMS databases are created, definitions of which data set group each segment type is to be written to are specified. Segments of an IMS database may be written into a number of data set groups, e.g., ten data set groups. Each segment type may only be assigned to one data set group. As noted above, a database record is made up of a root segment and child segments. As an IMS database is used, segments and database records are added, modified and deleted. Over time, the child segments of a database record may become scattered across different blocks within a data set group, resulting in slower access times and longer latencies than would occur if the child segments were closer together or contiguous. Reorganizing the location of the various segments of an IMS database such that segments of database records are closer together results in faster access times and shorter latencies.
IMS databases include a maintenance window, wherein the maintenance window is the xe2x80x9coff-linexe2x80x9d time for an IMS database. It is during the maintenance window, be that on a daily, weekly, monthly, or even less frequent basis, that changes to the structure of an IMS database may be made by a database administrator.
Currently, IMS databases are reorganized while the database is off-line. Also, all database records (i.e., the entire database) are reorganized, as no analysis of the state of each database record occurs. Therefore, database records which currently have their segments stored close together are reorganized, along with database records which currently have their segments scattered across different blocks. In other words, in prior art systems all database records are reorganized whether reorganization is necessary or not.
As noted above, the current techniques of reorganizing IMS databases do not include mechanisms to analyze the physical characteristics of database records in an IMS database before reorganizing the IMS database. Such an analysis may help to determine the benefit to be derived by reorganizing the IMS database. It is desirable, in the interest of efficiently using the maintenance window, to provide a method to determine which database records within an IMS database would benefit from reorganization.
For at least the foregoing reasons, there is a need for an improved system and method for analyzing the physical characteristics of database records in IMS databases, especially for more efficient reorganization of the database records.
The present invention provides various embodiments of an improved method and system for analyzing the physical characteristics of database records, such as in IMS databases. The information obtained during this analysis may then be used to perform a more efficient reorganization or restructuring of the database.
In one embodiment, the method involves tracing the database retrieval process to collect physical location information for each segment of each database record in the IMS database. The database retrieval process for each database record may begin at the root segment of the database record and traverse the child segments of the database record, preferably in hierarchical order, e.g. top to bottom, left to right. The database retrieval process identifies the segment code causing the first reference to a block and the number of segments retrieved from the block before fetching a new block. The physical location information for each segment of each database record in the IMS database may be analyzed to identify one or more database records which include at least one fragmented boundary twin chain. A twin chain is a collection of segments of the same type that have the same parent. A boundary parent segment is a parent segment, other than a root segment, that exists in a data set group as the lowest level segment in the data set group (i.e., all children of the parent segment are in a different data set group). If two or more boundary parent segments exist under the same parent, the boundary parent segments may also be referred to as a boundary twin chain. A boundary child segment is a non-parent segment that exists as the lowest level segment in the database hierarchy. Boundary child segments may reside in any data set group. Boundary child or boundary parent segments may also be referred to as boundary twin chains when a second segment of a particular segment type is created. A fragmented boundary twin chain is a boundary twin chain that spans more blocks than actually required.
Next, two calculations may be performed for those segments which are boundary twin chains. The first calculation may determine a total number of physical blocks currently used to hold the boundary twin chain. The second calculation may determine a minimum number of physical blocks needed to hold the boundary twin chain. If the total number of physical blocks currently used to hold the boundary twin chain exceeds the minimum number of physical blocks needed to hold the boundary twin chain by a predetermined amount, e.g., using a pre-determined ratio or by a pre-determined number of physical blocks, the boundary twin chain may be determined to be fragmented. Similar calculations may be performed for the database record. If the total number of physical blocks currently used to hold the database record exceeds the minimum number of physical blocks needed to hold the database record by a pre-determined amount, e.g., a number of physical blocks, the database record may be determined to be fragmented.
A reorganization recommendation list for the database record may be created in response to determining whether the database record may be fragmented. The reorganization recommendation list may contain values for a minimum and a currently used number of blocks for the database record, along with recommendations (i.e., xe2x80x9cyesxe2x80x9d: reorganize, xe2x80x9cnoxe2x80x9d: do not reorganize) for the database record and for each fragmented boundary twin chain in the database record. Thus the reorganization recommendation list may specify one or more records to be reorganized.
The system and method may then reorganize the database based on the reorganization recommendation list. This reorganization may reorganize only a subset of the database records, and may occur while the database is being actively used, e.g., not in the maintenance window.