The present invention relates generally to the field of database management and more particularly to rebalancing partitioned data in a database management system.
A partitioned database environment is a database installation allowing data distribution across two or more divisions with each division across one or more nodes. Relational database management systems (RDBMS) store data in database tables that are conceptually organized into records or rows with multiple columns, but may physically be separated into parts along either row or column boundaries between partitions. As records are added or removed from a partitioned table, the size of the partitions change. Over time, the partitions can become unbalanced with the partitions containing highly skewed distributions of data. Database administrators perform reorganization or rebalancing of database partitions to balance the usage of storage space, improve database system performance, or satisfy various system requirements.
Database tables are divided into partitions based on a boundary value, distribution key, or limit key which is typically a customer specified field or column within each row of data used to divide multiple rows of data. The starting (lowest) and ending (highest) limit key value defines a range of a partition, but the partition is typically referred to by the upper limit key. Partitioning rules require that all records for a single limit key value reside together within the same partition. The decision to set the range for each partition may be calculated based on the total number of data records to be loaded into the database, based on customer specific requirements, or some combination thereof. Besides range partitioning as shown in the example, other forms of partitioning include list partitioning, hash partitioning, and composite partitioning.
Even though tables are divided up into partitions which may reside on separate computers, the data may be accessed efficiently and conveniently in response to Structured Query Language (SQL) statements, such as SELECT, INSERT, DELETE, and UPDATE. The fact that databases are split across database partitions is transparent to users issuing SQL statements or commands. Because each partition may be on a separate physical machine, the processor on each machine is used by the database manager at each partition to manage the part of the total data in the database residing on that machine. Data partitions allow for parallel processing and faster execution of data requests while the user can send a data request without needing to know the specifics of the partitioning of the database.