The present invention relates to a method for efficiently reorganizing databases that are stored in a computer system.
1.1 Databases
As illustrated in FIG. 1, a database file typically includes "records" that are divided into "fields". A record in a database file may be thought of as a row in a table, containing, for example, all the information for a client. Fields may be thought of as columns in a table. For example, one field in a database may correspond to the name of a client, another field of the same record may correspond to the client's company.
Fields commonly have specific sizes that are determined when the database file is constructed. For example, a field containing dates may have two character positions each for the day and for the month, and four character positions for the year (a character position corresponds to a letter of text, numerical digit or punctuations etc.). Variable size fields may also be implemented in the database. These fields may acquire additional character positions as required. For example, a field containing comments about a client may contain a short phrase or a complete paragraph of text.
Records in a database may be rearranged into a desired order, by sorting, whereas fields are fixed with respect to the record. Hence, the fields are common to the database, even though the field contents are not.
It is often desired to be able to retrieve information quickly from the database. If a database is randomly organized, in no specific order, it may be necessary to check all data, in "linear search" to find the desired information.
If the data in a database are organized on a specific field, for example, on the name of a client, it is necessary to search only a small portion of the database to find the desired client. This corresponds to the normal use of, e.g., a dictionary or telephone directory. One possible approach to perform a search would be to divide the directory into two equal halves, and then determine which half contains the desired information. The half that does not contain the information is excluded, and the remaining half is again divided into two equally sized portions. The process continues until the information is found. This process might seem slow, but an item in a list of a thousand items will be located with ten or fewer process steps. Many computers perform a searches as described above.
1.2 Database Indexes
Most databases contain "support" files, in addition to the data files (data files store the actual information that is of interest to the user or his clients etc.). The "support" files may include "index" files. An index file works like the index in a book; it contains a key (corresponds to a word or phrase in the index of the book) and a pointer (corresponds to the page number in the index of the book).
for more efficient access to the data, indexes are used. Indexes are nothing more than an ordered set of key/pointer pairs stored in a separate file. The key is composed of one or more columns and the pointer identifies where to find the associated data row in the data file. Like the data file, the index file is made up of pages. Non-leaf pages contain information to access leaf pages and leaf pages contain the key/pointer pairs for accessing the data rows.
An index file typically contains "key/pointer" pairs that are related to a few fields in the database (i.e. a database may be arranged according to client names and telephone numbers). Each pointer is associated with a key, i.e., a copy of a data field from the original database. The pointer keeps track of where the associated record is stored in the data base. Naturally the index file has to be sorted when information is added or removed from the database, but the index file is much smaller than the database and is therefore quicker to sort. Furthermore, the organization of an index file may be based on a pointer structure (e.g., the well-known B-tree or B-plus tree organization).
1.3 Database Disorganization
The continual insertion, deletion, and updating of databases, IBM's DATABASE 2 (DB2) for mainframe computers in particular, causes data in the database to become disorganized. Disorganization can take the form of fragmented free space, data indirection, out-of-order data, and out-of-order index pages.
Fragmented free space is caused by repeated insertions and deletions of data rows such that some or all of the deleted space is not or cannot be used. This wasted space may cause files to grow unnecessarily large.
Data indirection is caused by updating a data row such that it no longer fits on the same page. In this case, a pointer is left on the original page that points to a new page containing the updated row. Data indirection always causes one additional I/O operation.
Out-of-order data is caused by repeated insertions such that the data rows are not in key sequence as defined by one of the indexes. This causes non-sequential and possibly random I/O to the data file when accessing large portions of data through the index.
Out-of-order index pages is caused by repeated insertions. The leaf pages containing successive key values are not adjacent to each other, causing non-sequential and possibly random I/O operations to the index file.
In general, disorganization causes more file storage to be used, more I/O operations to be performed, and less efficient I/O operations to be performed.
1.4 Overview of DB2 REORG and its Related Environments
IBM developed a product, DB2 REORG, that can reorganize existing DB2 data and index files. The result is a database with reduced diorganization. However, no user access to the database is allowed during reorginization. Hence, it is usually very desirable to perform the reorganization at a minimum amount of time as well as to be able to predict the time required.
In addition, it is necessary to record that a reorganization has been performed for recovery purposes. Information about execution of a utility is kept in one of the database managers special catalog tables. These tables (in normal operation) cannot be updated directly by a user. The database manager maintains these tables.
1.5 Multitasking and Multiprocessor Environments
Prior art "multitasking " facilities enable different programs to coexist in main memory simultaneously. Only one program at a time is able to run. However, when one program waits for, for example, its I/O (input and/or output) to complete, another program may run. I/O is usually directed by a separate controller, for example a disk drive controller. I/O is usually a slow process, compared to, for example, executing a program. Multitasking is often used to maximize the use of CPU (Central Processing Unit) time.
Prior art "multiprocessor" machines have two or more CPUs. When multiple CPUs are utilized, programs may be executed in parallel, one program per CPU. This concept can be used for simultaneous execution of programs that are independent of each other (e.g., programs that do not need data from one another during their respective operations).
Multitasking may of course be combined with multiprocessor arrangements to further enhance performance. Furthermore, programs running in multitasking and/or multiprocessor environments may be designed to communicate with each other. This enables processes that are not independent of each other to execute in parallel.
1.6 DB2 Tables
As illustrated in FIG. 2, a DB2 defines its own environment where data structures and file I/O are unique. A database may contain several "table spaces" and "indexes". A table space is a group of tables, and a table is a collection of data records (a table is what is generally termed database). The database may be stored an a single storage device or on multiple storage devices (disk drive, tape etc.) depending on whether a table is "simple", or "partitioned", or "segmented".
A simple table appears to the user as if it is stored linearly on a storage device (adjacent storage locations) even though its records may be mixed with other tables.
A partitioned table is divided into small groups, each group having adjacent storage locations. The groups, or partitions, can be located on different storage devices. Although the table is divided into groups, its behavior is as if it were a single table.
A segmented table is a table that is divided into groups (segments) that reside on the same storage device. Segmented tables are similar to simple tables, but cannot have records from different tables mixed within segments.
Segmented or partitioned tables may provide enhanced performance when accessing large data sets, since only a portion of the table has to be involved.
1.7 DB2 Indexes
A DB2 index is a set of pointers to data in a table. The index is stored separately from the table, but in the same database. The index is maintained by DB2, but can be checked, repaired, or reorganized by a user.
A "clustering index" defines the order in which records of a table are stored. Each table can have only one clustering index. The clustering sequence is the fastest sequence in which a table can be accessed.
Each index occupies its own index space (similar to table spaces), which is created simultaneously with the index. If a clustering index is created for a table in a partitioned table space, the index will also be partitioned into multiple index spaces.
1.8 DB2 I/O
A DB2 storage group is a set of volumes on a "direct access storage device" (commonly known as a DASD device). A DASD device can be a disk drive, a tape drive etc. A "volume" is a portion of DASD device that is referenced separately from the rest of the DASD device. A DASD device can have one or more volumes.
I/O is handled by DB2 in data units called "pages". A table space is divided into several equal sized portions (pages). DB2 reads and writes (I/O) data one page at a time.
Several pages are grouped together. A group of pages is preceded by a space map that contain information about the pages.
1.9 The DB2 Catalog Table
DB2 uses several database tables to track information about user databases, and for its own internal operation. DB2 limits access to these tables, in order to prevent users from accidentally changing vital information.
One of DB2's internal tables tracks the "access path" to database tables. An access path is a set of descriptors of what databases a program access and is allowed to access. This table is updated by DB2 as users enter new applications and tables into the DB2 environment and prevents users from accidentally altering DB2 system files.
Other essential DB2 system tables are a set of "catalog tables", one of which is the "recovery table" (e.g., SYSIBM.SYSCOPY) which is the table file being modified. It maintains information necessary to recover damaged data. However, there are cases when the tables inside the catalog table cannot be modified.
When the prior art REORG program is run, it updates the recovery table to reflect that some tables have been reorganized. Hence, it is vital that a program or routine that reorganizes one or more tables in the DB2 environment be able to update the table. Reorganizing DB2 tables without updating the recovery table will not allow the affected tables to be recovered correctly in the event of damage.
1.10 Internal Operation of DB2 REORG
IBM's DB2 REORG is mainly composed of six parts:
The INITIALIZE routine handles initialization of parameters used by UNLOAD, RELOAD, SORT, and BUILD.
UNLOAD reads all data from a data file on storage media and RELOAD writes the data back to the storage media and creates an index work file. SORT sorts the index work file, and BUILD builds the final index file. TERMINATE terminates program execution and releases resources like disk drives, main memory etc.
The UNLOAD routine reads one data row at the time from the data file using a sequential scan (i.e., data rows are read in the order they are stored on the storage medium, except in the case of indirect references) or by utilizing one of the cluster indexes. Partitions are unloaded one at a time. UNLOAD utilizes existing features of the DB2 environment, such as data and buffer managers.
The execution speed of the UNLOAD routine is highly dependent on the amount of I/O required (input or output operations). The amount of I/O is, in turn, dependent on the amount of disorganization of the database. Disorganization results in an increased amount of I/O operations and possibly very inefficient ordering of these operations.
RELOAD writes the data rows back to the data file in an organized manner during which new pointers or RIDs (Row Identification Numbers) are assigned, index keys are extracted, and an index work file is created, composed of key/RID pairs. Partitions are reloaded one at a time.
The SORT routine sorts the index work file by the key/RID values. It groups the entries together by which index they belong to.
The BUILD routine builds indexes from the sorted index work file, one index at a time.
Execution of the program parts that make up IBM's DB2 REORG is sequential, which means that program execution starts with INITIALIZing parameters and continues with execution of the other program parts in sequence, one at a time.
Most methods of doing things involve a sequential order, i.e., the second floor of a building is constructed before the third floor. However, portions of a sequential method may be performed in parallel.
Similar principles could be applied to software and the DB2 reorganization process in particular; albeit many problems need to be resolved.
The above routines, UNLOAD, RELOAD, SORT, and BUILD, do not utilize parallelism. Significant performance improvements may be realized with the proper synchronization scheme and software (or software equivalent hardware), running on a machine with multitasking and/or multiprocessor facilities. Most machines that run DB2 do provide multitasking and multiprocessor facilities.
1.11 Problems with DB2 REORG
The execution speed of the prior UNLOAD routine is highly dependent on the amount of I/O required (input or output operations). The amount of I/O is, in turn, a function of the amount of disorganization of the data base. Disorganization results in an increased amount of non-sequential I/O. Performance may be further degraded by using one of the indexes to unload data, depending on the amount of "out-of-orderness". If the index is severely disorganized, many additional non-sequential I/O cycles may be required to resolve indirect pointer references.
The extent of partitioning of tables also affect the performance of UNLOAD since they are unloaded one at a time.
The prior UNLOAD utilizes existing DB2 system components, the data and the buffer managers, to perform the actual I/O of tables and indexes. These system components do not perform at an optimum level. Partitions could actually be loaded in parallel since DB2 runs on machines that provide multitasking as well as multiprocessing (multiple CPUs). Furthermore, the buffer manger (a routine that handles I/O between application programs and DASD devices) does not organize data pages and space maps for maximum performance.
Data pages are written to the DASD device consecutively until the last data page of a set. At that point, I/O stops, the space map associated with that range of pages is updated, and a new space map is initialized for the consecutive set of data pages. The new space map, followed by the data pages for that set, resumes the I/O process.
Initialization of I/O to DASD devices is a slow process, where mechanical parts, read and write heads, must be positioned to a specific location of the disk or tape etc. (the precise location of data to be read or written). Common practice to minimize DASD initializations includes writing large sets of data at a time.
Similar constraints apply to the prior RELOAD routine, because it is involved with partitions and I/O in analogous ways.