1. Field of the Invention
This invention relates in general to computer-implemented database systems, and, in particular, to repartitioning data in a database.
2. Description of Related Art
Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. Relational databases are organized into tables which consist of rows and columns of data. The rows are formally called tuples or records. A database will typically have many tables and each table will typically have multiple tuples and multiple columns. The tables are typically stored on direct access storage devices (DASD), such as magnetic or optical disk drives for semi-permanent storage.
A table can be divided into partitions, with each partition containing a portion of the table's data. By partitioning tables, the speed and efficiency of data access can be improved. For example, partitions containing more frequently used data can be placed on faster data storage devices, and parallel processing of data can be improved by spreading partitions over different DASD volumes, with each I/O stream on a separate channel path. Partitioning also promotes high data availability, enabling application and utility activities to progress in parallel on different partitions of data.
Data may be distributed among partitions by a variety of schemes ("partitioning schemes"). One partitioning scheme assigns data to partitions according to a boundary value present in specified columns of the data row. The boundary value is the data value that separates each partition from the next partition. In one database system, the DB2.RTM. product offered by International Business Machines Corporation, Armonk, New York, a range of values is associated with each table partition by means of a CREATE INDEX statement. The CREATE INDEX statement gives the boundary value for each partition.
As records are added or removed from a partitioned table, the size of the partitions change. Over time, partitions can become unbalanced, with each partition containing widely different amounts of data. Parallel operations are less efficient when partitions are unevenly sized than when they are balanced in size. Moreover, sometimes a problem occurs with the size of the partitions because a database administrator who identified ranges for the partitions did not make an optimal selection, leaving the partitions unbalanced initially. The partitions could be rebalanced manually by a database administrator with a lot of effort, but this is time consuming and inefficient.
Additionally, rebalancing a subset of a table's partitions can result in all the table's partitions being unavailable to other applications. Finally, recovery of one or more partitions to a point in time prior to a manual rebalancing can result in data integrity problems.
Therefore, there is a need in the art for an improved method of repartitioning and balancing data.