1. Field of the Invention
The present invention relates to the online reorganization of data contained in a database. More particularly, the invention concerns a method and apparatus for reorganizing a database while allowing substantially uninterrupted access to the database.
2. Description of the Related Art
Databases are used on computers for a myriad of reasons. In many cases the databases are extremely large, having entries in the millions. With large databases, the information must be available at all times on a transactional or real time basis, and large mainframe computers are usually employed to access the data. International Business Machines Corporation, (IBM), assignee of the current invention, has developed the leading database environment referred to as DB2 for use in conjunction with compatible mainframe computers.
One feature common in all database systems and included in the DB2 is the capability to index various information. The use of the index allows faster access for searches and requests based upon the indexed information. DB2 uses a balanced tree index structure. In this structure, root, tree and leaf pages are used, with each page at each level containing the same number of entries, except the last one. The leaf pages are the lowest level and each contains a number of entries referring to the actual data records contained in DB2 data tables. Each leaf page is maintained in internal logical order automatically by DB2. Tree pages are the next level up, and are used to indicate the logical order of the leaf pages.
For large databases, there may be several layers of tree pages, for example, a higher level of tree pages referencing a lower level of tree pages. Ultimately, the number of tree pages is reduced such that all the entries or references fit into a single page referred to as the root page. As in leaf pages, within each tree or root page the entries are kept in logical order automatically by DB2.
One problem with this type of index organization is the physical location of the leaf pages often becomes quite scattered. Another problem is that the rows of an index--an index being ordered row by row--may become scattered across multiple data pages, rather than clustered together. This scattering results in reduced performance as now the storage device must move between widely scattered physical locations if logical order operations are to be performed. This is true of whatever type of direct access storage device (DASD) is used to store the index or data file. Therefore, the files, including the index file, need to be reorganized periodically so that the logical and physical ordering between the leaf pages and data pages better correspond. However, current methods used to reorganize the files require access to the files to be restricted for the most part of the reorganization process. In a database that requires 24.times.7 availability, that is, twenty-four hours-a-day, seven days-a-week accessibility, long durations of data unavailability are unacceptable.
One example of a database requiring 24.times.7 availability is a financial database for storing a bank's records. Regular record reorganization is required to minimize storage overhead for the ever changing records. However, a bank cannot afford to "close down" record access to reorganize its database. Customer service requires access during the day, and processing other transactions, commonly occurring at night, requires nighttime access.
Recognizing that a reorganization utility can be one of the largest inhibitors to data access--reorganization utilities commonly block access to the data by other utilities and applications during the reorganization process--several solutions have been proposed to make data available more of the time. These "online" reorganization methods help minimize data "outages."
A major drawback to these techniques is that they require a reorganizational process to request a "blocking drain", also known as a lock, on a resource, thereby making other processes wait. For example, if reorganizational process B requests a lock, it must wait until a process A, which already has a lock, finishes. If another process C comes along before process A finishes, process C queues up behind process B and must wait for both A and B to finish. Once A finishes, B locks the database and process C continues to wait until reorganizational process B can record a starting point. The wait experienced by process C can be substantial if process A is long running or does not complete, or if process C must also wait for additional processes preceding A to finish.
Accordingly, there is a need for an online database reorganization technique using a "non-blocking" drain which will wait for a resource without blocking other requests on that resource. Referring to the above example, there is a need for a technique where process C can access the database while process B waits for process A to finish so that a reorganization starting point, or logical record sequence number (LRSN), can be established by B, thereby allowing the reorganization process to begin.
There is also a need for an online database reorganization technique using a non-blocking drain which allows access to a database during the reorganization of the data and minimizes unavailability of the database in completing the reorganization process.