1. Field of the Invention
The invention relates to a method for sorting and reorganization of computer files, and more particularly, to a method for reorganizing a database index file.
2. Description of the Related Art
Databases are used on computers for a myriad of reasons. In many cases the databases are extremely large, having entries in the millions. When databases reach this size, and the information is needed in a transactional or real time basis, mainframe computers are utilized. International Business Machines Corp. (IBM) has developed a database environment referred to as DB2 for use on its mainframes. Given IBM's high market percentage in mainframes, DB2 is clearly the leading database system.
One feature common in all database systems and included in DB2 is the capability to index various information. The use of the index allows very fast access for searches and requests based on the indexed information. DB2 uses a balanced tree index structure. In this structure, root, tree and leaf pages are used, with each page at each level containing the same number of entries, except of course the last one. The leaf pages are the lowest level and each contains a number of entries referring to the actual data records contained in the DB2 data tables. Each leaf page is maintained in internal logical order automatically by DB2. Tree pages are the next level up, and are used to indicate the logical order of the leaf pages. For large databases, there may be many several layers of tree pages, a higher level of tree pages referencing a lower level of tree pages. Ultimately the number of tree pages is reduced such that all the entries or references fit into a single page referred to as the root page. As in leaf pages, within each tree or root page the entries are kept in logical order automatically by DB2.
One problem with such an index organization is that the physical location of the leaf pages often becomes quite scattered. This scattering results in reduced performance as now the storage device must move between widely scattered physical locations if logical order operations are to be performed. This is true of whatever type of Direct Access Storage Device (DASD) is used to store the index file. Therefore the index files need to be reorganized periodically so that the logical and physical ordering of the leaf pages better correspond.
IBM provides a utility with DB2 to reorganize the index file. Several other third-party DB2 utility providers also have index reorganization packages. These packages usually operate in the same manner. First, the entire index file is read in physical order. Each leaf page in the index file is then separated into its component record entries. Next, the record entries are sorted by key value using a conventional sort package. Finally, the sorted records are rewritten back into the index file. While this process may sound simple, it must be understood that quite often there are hundreds of thousands to millions of entries in the index file. When this number of entries is considered, then the process becomes quite time consuming, particularly the sorting step. The third party packages are faster than IBM's utility, but primarily because the sort packages used are more efficient. So even in those cases the process is quite tedious and is done less frequently than desirable, so overall database performance suffers. Therefore it is desirable to have a significantly faster DB2 index reorganization method, so that the indices can be reorganized more frequently and overall operations made more efficient.