Computers are powerful tools for storing, managing and providing access to vast amounts of information. Computer databases are one common mechanism for storing information on a computer while providing access to users. Common computer implementations of databases store data and indexes in various files or object.
Typically, users do not have direct access to the objects in which the data and/or indexes comprising a database are stored. Users are often provided indirect access to the data and indexes via a database management system (“DBMS”), or an application communicating with a DBMS. A DBMS is responsible for responding to requests from users or applications to change, update, delete and insert data into the physical objects. In this way, the DBMS acts as a buffer between the end-user and the physical data storage mechanism, thereby shielding the end user from having to know or consider the underlying hardware-level details of the table he is using.
There are several common database management systems including, for example, DB2 which employs tablespace and index objects to store and access data. Another example of a common DBMS implementation is IMS which employs database and index objects to store and access data.
In a typical database environment, rows of user data resides in tables which are maintained in data objects such as databases or tablespaces. Each object storing user data may have one or more indexes. Each index facilitates access to rows of the table according to a key. The key of an index is typically data from one or more columns of the table. The rows of data are available to batch and online applications for reading, updating, deleting and inserting new data. When a row of data is inserted or deleted, a corresponding insertion or deletion is performed on all associated indexes. When a key column is updated, all corresponding indexes are also updated.
Typical tables and indexes may include thousands of records. In many DBMS's, all changes, updates, deletions and insertions to the objects are recorded to a log file. The log function is one of the busiest functions in a DBMS due to the large number of records and the high volume of changes being made to objects. A typical DBMS log function also allows for a log exit. Namely, before the DBMS writes each log record, it calls a log exit routine and passes the address of the log record to the routine.
Over time, changes, additions and deletions from a table and/or index may result in an inefficient organization of the stored data, and may affect the ability of the DBMS to timely respond to requests from end-users and applications. To maintain efficient data storage and access, utilities have been developed to reorganize data and index objects. Such utilities may be periodically executed to correct the inefficient organization of data caused by the processing of requests since the last time the reorganization utility was executed. Reorganization utilities are employed periodically because of the time and resources required to perform the reorganization of data.
While a reorganization utility is executing, batch and online applications which require access to the data and/or index objects being reorganized may be executing concurrently. For this reason, reorganization utilities typically examine and reorganize the subject objects in two phases. In the first phase, the subject object is reorganized to account for all changes which have occurred up to the execution of the reorganization utility.
In the second phase, the typical reorganization utility accounts for all changes which have occurred during the execution of the reorganization utility. This is accomplished by reviewing all log file records reflecting changes requested by the concurrent batch and online applications. Before completing the reorganization, the utility processes all of the changes written to the log file, thereby providing an up to date reorganization of the subject data or index object.
A typical DBMS environment is illustrated in FIG. 1. As shown, the environment includes a database 110 for maintaining and allowing access to stored information. Database 110 includes at least one data object 112 for storing rows and columns of data. Database 110 preferably includes one or more indexes 114 and 116 associated with data object 112 to assist in accessing the data stored therein. Of course, indexes 114 and 116 are optional, and data object 112 is not required to have any index.
Access to database 110 is provided by Database Management System (“DBMS”) 120. DBMS 120 enables user 130 to access database 110. DBMS 120 also enables user 150 to access database 110 indirectly through application 140. DBMS 120 includes routines for reading, adding, deleting and changing the data stored in database 110. DBMS 120 also includes at least one routine for logging all changes made to any object managed by DBMS 120. The logging function may utilize a log database 122 embodied as a data object 124 and an index object 126. In addition to routines for logging changes, DBMS 120 further includes utilities for maintaining the integrity of the data stored in data object 112 and indexes 114 and 116. Certain utilities may be used to rebuild the files or objects within database 110 in the event they become corrupted. Other utilities, specifically a reorganization utility may be used to rearrange the data stored in database 110 for more efficient access. The reorganization utility may operate on data object 112, index 114, index 116, and any combination thereof.
Referring now to FIG. 2, there is depicted the steps that a conventional DBMS log routine executes each time a data or index entry is added, deleted or modified. At step 210 a log record is created in a log file. The log file contains changes made by the DBMS to data and/or index objects. The log record identifies the affected data or index object, identifies the record of the affected file and describes the type of activity that resulted in a change to the record. At step 212, a log exit routine is called. The log exit routine is called prior to the writing of the log record, and the address of the log record is passed as part of the call. At step 214, the log record is actually written to the log file.
Referring generally to FIGS. 3A and 3B, there is depicted a block diagram illustrating the steps that a conventional reorganization utility performs to more efficiently store data. The steps are collectively referred to by reference numeral 300. Although conventional reorganization utilities may operate on both data and index objects, FIGS. 2, 3A, and 3B are described in terms of reorganizing a data object. Of course, analogous steps are performed when reorganizing an index.
Reorganization utility 300 operates in two phase. During the first phase, depicted in FIG. 3A, the utility individually copies each record from the data object as it exists at the beginning of the reorganization. During the second phase, depicted in FIG. 3B, the reorganization utility accounts for any changes that are made to the data object while processing the first phase. Such changes may be requested by users, online applications or batch applications that require access to the data object concurrently with processing the first phase of the reorganization.
Referring now to FIG. 3A, the steps of the first phase of a conventional reorganization utility are depicted. At step 310, the reorganization utility creates an empty “shadow” data object based on the format of the real data object to be reorganized. Each record of the real data object is read at step 312. As illustrated by decision block 314, if attempting to read a record from the real data object at step 312 results in an End-of-File condition, the reorganization utility begins the second phase of processing. If a record is successfully read at step 312, the record is written to the shadow data object at step 316.
Referring now to FIG. 3B, the steps of the second phase of a conventional reorganization utility are depicted. Upon entering the second phase of processing, at step 318, the reorganization utility searches for the first record in the log file that pertains to a record of the data object being reorganized, where the logged change occurred after the reorganization utility was invoked. In subsequent iterations, step 318 will search for the next record in the log file that pertains to a record of the data object being reorganized. At step 320, the record is read from the log file.
As illustrated by decision block 322, if attempting to read a record from the log file at step 320 results in an End-of-File condition, the reorganization utility completes the reorganization process by performing step 326. If a log file record is successfully read at step 320, the change described by the log file record is applied to the shadow index at step 324, and processing is directed back to step 318. After all concurrent changes to the data object have been applied, the newly reorganized shadow data object is renamed to become the real data object at step 326, thereby allowing access to the reorganized data object.
Consequently, a need exists for an improved method and system for reorganizing data that enables a reorganization utility to operate more efficiently than conventional reorganization utilities. Specifically, a need exists for a method and system that reduces the processing related to effecting changes that are made to a file while it is being reorganized.