1. Field of the Invention
The present invention relates to computer software, and more particularly to reorganizing databases.
2. Description of the Related Art
The IMS database (IMS DB) was created in 1970 by International Business Machines Corporation (IBM) and is one of the two major parts to IBM""s IMS/ESA (Information Management System/Enterprise Systems Architecture). The second part is a data communications system (IMS Transaction Manager or IMS TM). Together, the transaction manager and the database manager create a complete on-line transaction processing environment providing continuous availability and data integrity. IMS/ESA runs under the MVS/ESA or OS/390 operating systems, which run on the S/390 platform.
At the heart of IMS DB are its databases and its data manipulation language, Data Language/I (DL/I). The IMS database is a hierarchical (non-relational) database. IMS databases are hierarchic collections of data, information organized in a pyramid fashion with data at each level of the hierarchy related to, and in some way dependent upon, data at the higher level of the hierarchy. DUI calls allows a user to create and access these IMS databases. The terms xe2x80x9cuserxe2x80x9d and xe2x80x9ccustomerxe2x80x9d are synonyms; therefore, any reference to xe2x80x9cuserxe2x80x9d may be interchanged with xe2x80x9ccustomerxe2x80x9d herein.
An IMS database may include one or more data set groups. Each data set group may include one or more segments. A segment is the smallest piece of data DL/I can store. Each segment may be qualified by its hierarchical relationship to other segments in a database record. Each database record has one root segment and zero or more child segments. A xe2x80x9croot segmentxe2x80x9d is at the top of the hierarchy, and there may be only one root segment in a database record. All other segments (other than the one root segment) in a database record are referred to as xe2x80x9cdependent segmentsxe2x80x9d, and their existence depends on there being a root segment (i.e., dependent segments are hierarchically related to the root segment). A xe2x80x9cparent segmentxe2x80x9d is any segment that is defined in the database descriptor (DBD) as capable of having a dependent segment beneath it in the hierarchy. A xe2x80x9cchild segmentxe2x80x9d is any segment that is a dependent on another segment above it in the hierarchy.
Segments may be of various segment types. Those segments which share similar qualities are of the same type. For example, if the root segment of a database record represents a course, and that root segment has three child segments labeled: instructor, student, and location, those child segments may be referred to as segment types.
An IMS database record may be divided into a number of different segment types (e.g., from 1 segment type up to 127 segment types). The database structure may have up to 15 levels of hierarchy. Each additional level of hierarchy creates a parent/child relationship between the two levels with the root segment being the parent of all the segments in the database record. Each segment type is defined as either fixed or variable in size.
The root segment is referred to as a first level of the IMS database, direct children of the root segment are referred to as a second level of the IMS database. As used herein, a second level of the IMS database may alternatively be referred to as a first level child segment, as child segments may only appear starting with the second level of the IMS database. Similarly, children of the children of the root segment (i.e., grandchildren of the root segment) are referred to as a third level of the IMS database, or alternatively, second level child segments. The level of each subsequent generation of children may be determined by incrementing the previous level by one (e.g., a fourth level of the IMS database is equivalent to a third level child segment).
As used herein, a xe2x80x9cblockxe2x80x9d is a fixed length section of data. As used herein, a xe2x80x9ccontrol interval (CI)xe2x80x9d denotes the smallest data container for a VSAM (Virtual Sequential Access Method) data set. It is noted that VSAM is a product of IBM Corporation. The terms xe2x80x9cblockxe2x80x9d and xe2x80x9ccontrol intervalxe2x80x9d are synonyms; therefore, any reference to xe2x80x9cblockxe2x80x9d may be interchanged with xe2x80x9ccontrol intervalxe2x80x9d herein.
An IMS database is typically divided into either blocks or control intervals. The size of an IMS database may be defined as the total number of blocks or control intervals in the database. As segments are created for a new database record, the segments may be added to a block until the block can no longer hold another full segment. At that time, a new block may be selected and any additional segments for the database record may be placed in the new block. This process may continue until no more segments are created for the database record. At that time, the database record is contained within a given number of database blocks. When each block of the database record contains only segments for the same database record, and is full (i.e., is not able to hold segments for this or any other database record), then the database record may be referred to as optimally organized for data retrieval. Typically, access to an optimally organized database record requires a number of input/output (I/O) operations equal to the number of blocks holding the database record.
An IMS database includes ten data set groups into which segments of an IMS database may be written. Each segment type may only be assigned to one data set group. When IMS databases are created, definitions of which data set group each segment type is to be written to are specified. In some cases, an IMS database may also be divided into partitions, in addition to being distributed across data set groups. A database record is made up of a root segment and child segments. As an IMS database is used, segments and database records are added, modified and deleted. Over time, the child segments of a database record may become scattered across different blocks within a data set group, resulting in slower access times and longer latencies than would occur if the child segments were closer together. Reorganizing the location of the various segments of an IMS database such that segments of database records are closer together results in faster access times and shorter latencies.
All space within an IMS database is shared space, therefore a block may hold segments from one or more database records. When segments are added to a database record after initial creation of the database record, it is often necessary for these added segments to be placed into blocks that do not currently hold segments for the database record. As more and more segments are added to the database record, the database record begins to lose optimal organization qualities because the segments typically occupy more blocks than would be necessary to hold the database record if the database record was contained within an optimal number of blocks. Therefore, access to such a database record typically requires a number of I/O operations equal to the total number of blocks now occupied by the database record. The number of I/O operations in excess of the optimal number may be extensive if the number of partially occupied blocks the database records spans is high. When the number of I/O operations exceeds the optimal number, database and application performance may degrade. Therefore, it is desirable to reorganize an IMS database when many of the database records exhibit these characteristics.
The need to reorganize an IMS database stems from the dynamic nature of insertions and deletions of segments in an IMS database. In general, as new child segments are added to an IMS database hierarchy, the segments may be added to blocks depending on space availability. As a result, related segments (i.e., segments belonging to the same database record) may be stored in different blocks, possibly non-contiguous blocks. This results in a fragmented database, as shown in FIG. 3. As a result, access of a database record may require reading a number of non-contiguous blocks, which results in lengthier access times. One method of reducing access times is to reorganize the IMS database in order to more closely position segments belonging to the same database record.
Reorganizing an IMS database in order to speed up the access times and reduce the latencies is more desirable to the user if user access to the IMS database may be maintained during the reorganization (i.e., on-line reorganization). Additionally, it is desirable to recommend the optimal number of fragmented database records to reorganize, rather than to recommend reorganizing all of the fragmented database records of a database. An optimal recommendation, when implemented, may yield a partially reorganized database by minimizing the amount of data movement while yielding the greatest results in terms of input/output (I/O) reduction. For at least the foregoing reasons, there is a need for an improved system and method for analyzing a database, such as an IMS database, to determine an on-line reorganization recommendation of the database, preferably an optimal on-line reorganization recommendation of the database.
The present invention provides various embodiments of an improved method and system for analyzing a database for determining an approach for reorganization of the database. In one embodiment, the method involves receiving a list of database records to be analyzed. Each database record on the list may be analyzed. Updates to the database may be performed while the analyzing occurs. Each analyzed database record on the list which meets certain criteria may be designated as xe2x80x9cqualifiedxe2x80x9d. The criteria used to determine if a database record is xe2x80x9cqualifiedxe2x80x9d may include using one or more qualifying control variables to determine if each database record is a qualified database record. The qualifying control variables may include one or more of: a block improvement percentage factor, a minimum block improvement factor, and a minimum input/output savings factor. The criteria may require the block improvement percentage factor to be greater than or equal to one percent and the minimum block improvement factor to be greater than or equal to one. Alternatively, the criteria may require the block improvement percentage factor to be greater than or equal to one percent and the minimum block improvement factor to be equal to the minimum input/output savings factor.
Identifying information for each qualified database record may be added to a detail array. The process of adding identifying information for each qualified database record to the detail array may include: creating an entry (e.g., a count element, a blocks improved element, and a total size of database record element) in the detail array upon the first occurrence of values for a given set of qualifying control variables; incrementing the count element of a given entry in the detail array when a second or subsequent occurrence of the values for the given set of qualifying control variables is found; incrementing the blocks improved element of a given entry in the detail array when a second or subsequent occurrence of the values for the given set of qualifying control variables is found; and incrementing the total size of database record element of a given entry in the detail array when a second or subsequent occurrence of the values for the given set of qualifying control variables is found. The given set of qualifying control variables may include the block improvement percentage factor and the minimum block improvement factor.
The qualified database record information may be aggregated based on information in the detail array. The process of aggregating the qualified database record information may include constructing a roll-up array based on the data in the detail array. The detail array may include one or more intersection points. An intersection point may include a first value for a block improvement percentage factor qualifying control variable and a second value for a minimum block improvement factor qualifying control variable. The roll-up array may include summary data for each intersection point. The summary data for a given intersection point may include summary data for each intersection point in the detail array that is greater than or equal to the first value for the block improvement percentage factor qualifying control variable for the given intersection point, and also greater than or equal to the second value for the minimum block improvement factor qualifying control variable for the given intersection point. The summary data for a given intersection point may include a total count of database records, a total blocks improved, and a total size of database records. Input/output saved per megabyte moved values may be calculated based on the summary data for each intersection point. The calculated input/output saved per megabyte moved values may be stored in a results tracking array.
A reorganization strategy may be recommended based on the aggregated qualified database record information. The recommended reorganization strategy may use one or more of the following user-specified guidance variables: a target percent of total qualified guidance variable; a target percent total size qualified guidance variable; a qualified guidance variable; a percent minimum megabytes moved guidance variable; a size minimum megabytes moved guidance variable; and a minimum input/output savings guidance variable. Recommended values for the qualifying control variables may be provided, based on the user-specified guidance variables. Alternate or additional recommended values for the qualifying control variables may be provided, based on the input/output saved per megabyte moved values in the results tracking array.
The recommended reorganization strategy may be that no reorganization of the database is required. The recommended reorganization strategy may be for an amount of reorganization of the database and a method of reorganization of the database. For example, the recommended amount of reorganization of the database may be a full reorganization of the database; and the recommended method of reorganization of the database may be an offline reorganization or concurrent reorganization of the database. Another example, the recommended amount of reorganization of the database may be a partial reorganization of the database; and the recommended method of reorganization of the database may be an offline reorganization or on-line reorganization of the database. The recommended reorganization strategy may be implemented. Concurrent updates to the database may be allowed while the reorganization strategy is implemented.