1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to a method of providing in-place reorganization of a database.
2. Description of Related Art
(Note: This application references a number of different publications as indicated throughout the specification by reference numbers enclosed in brackets, e.g., [x]. A list of these different publications ordered according to these reference numbers can be found in Section 8 of the xe2x80x9cDetailed Description of the Preferred Embodiment.xe2x80x9d Each of these publications is incorporated by reference herein.)
Any database management system (DBMS) can need some type of reorganization. Reorganization of a database is defined as changing some aspect of the logical and/or physical arrangement of the database. A tutorial paper referenced in [12] discusses issues in reorganization and types of reorganization. This specification describes the problem in reorganizing offline, and the need for online reorganization. (See, e.g., [11]).
The type of reorganization described herein involves restoration of clustering. Clustering is the practice of storing records near each other if they meet certain criteria. One popular criterion is consecutive values in a column of the records. Clustering should reduce disk input/output for records that users often access together. When users write data into the database, this writing can decrease the amount of clustering and thus degrade performance.
Reorganization can restore clustering and performance. During most types of reorganization in a typical database, the area being reorganized is offline or only partially available; users cannot write (and perhaps cannot even read) data in that area. However, a highly available database (a database that is to be fully available 24 hours per day, 7 days per week) should not go offline for significant periods, of course. Applications that require high availability include reservations, finance (especially global finance), process control, hospitals, police, armed forces, and Internet service.
Even for less essential applications, many database administrators prefer 24 hour availability. The maximum tolerable period of unavailability is specific to the application. When queried, DBMS customers (not all of whom have highly available databases) state that the maximum tolerable period ranges from 0 to 5 hours. Even without such a preference for 24-hour availability, reorganizing a very large database might require much longer than the maximum tolerable period of unavailability.
As examples of very large databases, a survey paper [6] mentions a database with several terabytes of data and the desire for one with petabytes. The author of one book [14] considers offline reorganization such an important problem for very large databases that he defines a very large database as one xe2x80x9cwhose reorganization by reloading takes a longer time than the users can afford to have the database unavailable.xe2x80x9d These considerations call for the ability to reorganize the database online (concurrently with usage or incrementally within users"" transactions), so that users can read and write the database during most or all phases of reorganization.
In the context of papers that do not concentrate on online reorganization, many people have stated the need for the ability to reorganize online. As the amount of information and dependence on computers both grow, the number of very large or highly available databases will grow. Therefore, the importance of online reorganization will grow.
The present invention provides methods for in-place online reorganization (specifically, for restoration of clustering). The data structures are those of IBM""s DBMS Database 2 (DB2) for OS/390 [4], but the concepts in the methods presented herein should apply to many DBMS""s. The methods perform reorganization in place; i.e., they do not make a new copy of the data being reorganized. To allow high-throughput concurrent usage by users of the database, the methods track the reorganization""s movement of records across a user""s position within a scan of data, and they correct the behavior of a user transaction to account for the movement.
This specification describes relevant features of a DBMS, discusses the advantages of the present invention over previous research (including the novelty of the methods), presents the concepts in the methods, describes the methods in more detail, and proposes extensions based on the methods.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, system, and article of manufacture for providing in-place reorganization of a database that achieves reasonably accurate results for users during high-throughput concurrent usage of the database. The reorganization""s movement of records across a user transaction""s position within a scan of the database is tracked. The behavior of the user transaction is corrected to account for the movement of the records.